scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

Issue with running scrapy spider from script. #2473

Closed tituskex closed 2 years ago

tituskex commented 7 years ago

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

IAlwaysBeCoding commented 7 years ago

I would try to downgrade your twisted version from Twisted==16.7.0rc1 to Twisted==16.4.1. I got some weird errors too on the downloader part when I ran my Scrapy spiders with the same version you are running.


2017-01-02 14:25:00 [scrapy] ERROR: Error downloading <GET http://www.citysearch.com/profile/645344264/jackson_ms/wright_patrick_b_md_patrick_b_wright_md.html>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 60, in download_request
    return agent.download_request(request)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
    parsedURI.originForm)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 779, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable

After, downgrading to the version I had(Twisted==16.4.1) things went back to working great again.

command: pip install Twisted==16.4.1 If you need sudo access then add it to your command.

IAlwaysBeCoding commented 7 years ago

2479 This is related to this one as well.

redapple commented 7 years ago

@tituskex , did you manage to make it work? did downgrading Twisted work?

pembeci commented 7 years ago

Downgrading Twisted worked for me too.

kmike commented 7 years ago

@pembeci what is your Scrapy version?

pembeci commented 7 years ago

@kmike The latest from pip install: 1.3.2. I am running on an old machine which is not upgraded for a while: Ubuntu 12.04 LTS - 32 bit So may be that's why I needed to downgrade Twisted.

kmike commented 7 years ago

@pembeci what was the exception? Hm, maybe it is caused by Twisted 17+ dropping pyOpenSSL < 0.16 support.

rmax commented 7 years ago

@pembeci I would recommend to use (mini)conda to have the latest releases without having to upgrade system libraries in old systems.

wzpan commented 7 years ago

+1 . Same problem with scrapy (1.3.2) and twisted (17.1.0) .

  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'
kmike commented 7 years ago

@wzpan what is your pyOpenSSL version?

Twisted dropped support for pyOpenSSL < 16.0.0 in Twisted 16.4.0 release (see http://twistedmatrix.com/trac/ticket/8441); in fact it worked for some time, but they recently removed some of the supporting code as well. Is upgrading it an option? You can check pyOpenSSL version by running python -c 'import OpenSSL; print(OpenSSL.version.__version__)'

wzpan commented 7 years ago

@kmike awesome! πŸ‘ My pyOpenSSL version is 0.13.1. After upgrading it to 16.2.0, scrapy works like a charm!

noprom commented 7 years ago

I run into this problem, too. Here is my stacktrace:

➜  ~ scrapy shell 'http://jbk.39.net/bw_t1/'
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 7, in <module>
    from scrapy.cmdline import execute
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 9, in <module>
    from scrapy.crawler import CrawlerProcess
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 7, in <module>
    from twisted.internet import reactor, defer
  File "/Library/Python/2.7/site-packages/twisted/internet/reactor.py", line 38, in <module>
    from twisted.internet import default
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 56, in <module>
    install = _getInstallFunction(platform)
  File "/Library/Python/2.7/site-packages/twisted/internet/default.py", line 50, in _getInstallFunction
    from twisted.internet.selectreactor import install
  File "/Library/Python/2.7/site-packages/twisted/internet/selectreactor.py", line 18, in <module>
    from twisted.internet import posixbase
  File "/Library/Python/2.7/site-packages/twisted/internet/posixbase.py", line 18, in <module>
    from twisted.internet import error, udp, tcp
  File "/Library/Python/2.7/site-packages/twisted/internet/tcp.py", line 28, in <module>
    from twisted.internet._newtls import (
  File "/Library/Python/2.7/site-packages/twisted/internet/_newtls.py", line 21, in <module>
    from twisted.protocols.tls import TLSMemoryBIOFactory, TLSMemoryBIOProtocol
  File "/Library/Python/2.7/site-packages/twisted/protocols/tls.py", line 63, in <module>
    from twisted.internet._sslverify import _setAcceptableProtocols
  File "/Library/Python/2.7/site-packages/twisted/internet/_sslverify.py", line 38, in <module>
    TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Version:

➜  ~ python --version
Python 2.7.10
➜  ~ pip list | grep Scrapy
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
Scrapy (1.2.1)

Any help would be appreciated.

wzpan commented 7 years ago

@noprom Try doing these:

pip install --upgrade scrapy
pip install --upgrade twisted
pip install --upgrade pyopenssl
noprom commented 7 years ago

@wzpan But another problem occurs:

➜  OS scrapy shell 'http://jbk.39.net/bw_t1/'
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 20:31:05 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 20:31:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 20:31:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-02 20:31:05 [scrapy.core.engine] INFO: Spider opened
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://jbk.39.net/bw_t1/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
2017-03-02 20:31:05 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://jbk.39.net/bw_t1/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 142, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 88, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 149, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/shell.py", line 73, in run
    shell.start(url=url, redirect=not opts.no_redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 48, in start
    self.fetch(url, spider, redirect=redirect)
  File "/Library/Python/2.7/site-packages/scrapy/shell.py", line 115, in fetch
    reactor, self._schedule, request, spider)
  File "/Library/Python/2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
    result.raiseException()
  File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]

It seems that there's a problem with twisted.

rmax commented 7 years ago

@noprom The site does not complete the response when you use the default user agent (or the one you are using).

$ scrapy shell 'http://jbk.39.net/bw_t1/' --set USER_AGENT=Mozilla --loglevel INFO
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'USER_AGENT': 'Mozilla', 'LOG_LEVEL': 'INFO'}
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 09:38:49 [scrapy.core.engine] INFO: Spider opened
2017-03-02 09:38:50 [traitlets] WARNING: Config option `pager` not recognized by `InteractiveShellEmbed`.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x109100d68>
[s]   item       {}
[s]   request    <GET http://jbk.39.net/bw_t1/>
[s]   response   <200 http://jbk.39.net/bw_t1/>
[s]   settings   <scrapy.settings.Settings object at 0x109100eb8>
[s]   spider     <DefaultSpider 'default' at 0x10bf23dd8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.body[:100]
b'\r\n<!doctype html>\r\n<html>\r\n<head>\r\n    <meta http-equiv="Content-Type" content="text/html; charset=g'
noprom commented 7 years ago

@rolando Cool! Thanks a lot.πŸ˜„

noprom commented 7 years ago

@wzpan Thanks, you solved my problem.

rapliandras commented 7 years ago
pip install Twisted==16.4.1

also solved mine, but tbh backwards incompatibility is a shame twisted guys should really get this fixed

eegilbert commented 7 years ago

I couldn't even run scrapy by itself with out the SSL error until I downgraded Twisted from 17 to 16.4.1 per @rapliandras

redapple commented 7 years ago

For the record, we've released "packaging fix" versions that prevent Twisted>=17 getting installed, because branches 1.0.x, 1.1.x and 1.2.x only support Twisted<=16.6

Master branch (and the recent v1.3.3) are compatible with Twisted 17+

redapple commented 7 years ago

So it seems that latest Twisted does require pyOpenSSL>=0.16, but provided you add the [tls] extra, as-in pip install twisted[tls]. Twisted 15.5 required pyOpenSSL>=0.13, but Twisted 16.6 requires pyOpenSSL>=0.16. I think Scrapy should add the [tls] extra in its requirements, even if it will show a warning for Twisted<15 (the extra did not exist then). It should not prevent Scrapy from getting installed.

kmike commented 7 years ago

@redapple I haven't realized it is just a warning, not an error. If adding [tls] still allows to install Twisted then +1 to add it.

kmike commented 7 years ago

It seems that pip < 6.1.0 raises an error if extra requirement is unknown intead of showing a warning - see https://github.com/pypa/pip/pull/2142. I'm not sure what happens if Twisted < 15.0 is already installed, user has pip < 6.1.0 (e.g. pip 1.5 is still popular), and runs pip install scrapy - does it work?

redapple commented 7 years ago

Good point @kmike . It does not work if one asks for Twisted<15:

$ pip install --upgrade 'pip<6.1.0'
$ pip install 'twisted<15'
$ pip install --upgrade 'twisted[tls]<15'
Successfully installed twisted-14.0.2
$ pip install --upgrade 'twisted[tls]<15'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already up-to-date: twisted[tls]<15 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/home/paul/.virtualenvs/piptests/local/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

If we consider upgrades to latest Twisted, it works though, because latest Twisted has the extra:

$ pip install --upgrade 'twisted[tls]'
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting twisted[tls] from https://pypi.python.org/packages/d2/5d/ed5071740be94da625535f4333793d6fd238f9012f0fee189d0c5d00bd74/Twisted-17.1.0.tar.bz2#md5=5b4b9ea5a480bec9c1449ffb57b2052a
  Using cached Twisted-17.1.0.tar.bz2
    Installed /tmp/pip-build-RuAoHT/twisted/.eggs/incremental-16.10.1-py2.7.egg
Requirement already up-to-date: zope.interface>=3.6.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from twisted[tls])
Collecting constantly>=15.1 (from twisted[tls])
  Using cached constantly-15.1.0-py2.py3-none-any.whl
Collecting incremental>=16.10.1 (from twisted[tls])
  Using cached incremental-16.10.1-py2.py3-none-any.whl
Collecting Automat>=0.3.0 (from twisted[tls])
  Using cached Automat-0.5.0-py2.py3-none-any.whl
Collecting pyopenssl>=16.0.0 (from twisted[tls])
  Using cached pyOpenSSL-16.2.0-py2.py3-none-any.whl
Collecting service-identity (from twisted[tls])
  Using cached service_identity-16.0.0-py2.py3-none-any.whl
Collecting idna>=0.6 (from twisted[tls])
  Using cached idna-2.5-py2.py3-none-any.whl
Requirement already up-to-date: setuptools in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: six in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from Automat>=0.3.0->twisted[tls])
Collecting attrs (from Automat>=0.3.0->twisted[tls])
  Using cached attrs-16.3.0-py2.py3-none-any.whl
Collecting cryptography>=1.3.4 (from pyopenssl>=16.0.0->twisted[tls])
  Using cached cryptography-1.8.1.tar.gz
Collecting pyasn1-modules (from service-identity->twisted[tls])
  Using cached pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity->twisted[tls])
  Using cached pyasn1-0.2.3-py2.py3-none-any.whl
Requirement already up-to-date: packaging>=16.8 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Requirement already up-to-date: appdirs>=1.4.0 in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting asn1crypto>=0.21.0 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached asn1crypto-0.22.0-py2.py3-none-any.whl
Collecting enum34 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached ipaddress-1.0.18-py2-none-any.whl
Collecting cffi>=1.4.1 (from cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Downloading cffi-1.10.0.tar.gz (418kB)
    100% |################################| 421kB 437kB/s 
Requirement already up-to-date: pyparsing in /home/paul/.virtualenvs/piptests/lib/python2.7/site-packages (from packaging>=16.8->setuptools->zope.interface>=3.6.0->twisted[tls])
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.3.4->pyopenssl>=16.0.0->twisted[tls])
  Using cached pycparser-2.17.tar.gz
Installing collected packages: pycparser, cffi, ipaddress, enum34, asn1crypto, pyasn1, pyasn1-modules, cryptography, attrs, idna, service-identity, pyopenssl, Automat, incremental, constantly, twisted
(...)
Successfully installed Automat-0.5.0 asn1crypto-0.22.0 attrs-16.3.0 cffi-1.10.0 constantly-15.1.0 cryptography-1.8.1 enum34-1.1.6 idna-2.5 incremental-16.10.1 ipaddress-1.0.18 pyasn1-0.2.3 pyasn1-modules-0.0.8 pycparser-2.17 pyopenssl-16.2.0 service-identity-16.0.0 twisted-17.1.0

Is it fair to say that installing and upgrading via pip with twisted[tls] in dependencies would work in this case? (assuming Twisted>=15 is available from the package index being used) I may be missing something.

kmike commented 7 years ago

I was asking about a different case:

  1. User already has Twisted < 15 installed (e.g. from system packages), but doesn't have Scrapy installed.
  2. Then user runs pip install scrapy, without --upgrade or specifying a version.

It seems it can fail (I've execute this in a clean virtualenv):

> pip install 'pip < 6.1.0'
..snip..
> pip install 'twisted<15'
..snip..
> pip install twisted[tls]
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): twisted[tls] in /Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages
  Exception:
  Traceback (most recent call last):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main
      status = self.run(options, args)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run
      requirement_set.prepare_files(finder)
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files
      req_to_install.extras):
    File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2504, in requires
      "%s has no such extra feature %r" % (self, ext)
  UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'
redapple commented 7 years ago

Alright, so the best we can do until scrapy requires twisted[tls]>=15 is to document this in the faq perhaps. And maybe suggest either pip install --upgrade scrapy or downgrade Twisted. Thoughts ?

Le 28 mars 2017 13:26, "Mikhail Korobov" notifications@github.com a Γ©crit :

I was asking about a different case:

  1. User already has Twisted < 15 installed (e.g. from system packages), but doesn't have Scrapy installed.
  2. Then user runs pip install scrapy, without --upgrade or specifying a version.

It seems this can fail in a clean virtualenv:

pip install 'pip < 6.1.0' ..snip.. pip install 'twisted<15' ..snip.. pip install twisted[tls] You are using pip version 6.0.8, however version 9.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Requirement already satisfied (use --upgrade to upgrade): twisted[tls] in /Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages Exception: Traceback (most recent call last): File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/basecommand.py", line 232, in main status = self.run(options, args) File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/commands/install.py", line 339, in run requirement_set.prepare_files(finder) File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/req/req_set.py", line 436, in prepare_files req_to_install.extras): File "/Users/kmike/envs/tst-scrapy/lib/python2.7/site-packages/pip/_vendor/pkg_resources/init.py", line 2504, in requires "%s has no such extra feature %r" % (self, ext) UnknownExtra: Twisted 14.0.2 has no such extra feature 'tls'

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scrapy/scrapy/issues/2473#issuecomment-289740276, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2GGAR3b9URqFQT9lCAjtyAUzp6dB1Pks5rqO5ZgaJpZM4LY7Vt .

kmike commented 7 years ago

For the record, both debian jessie and ubuntu 14.04 use pip 1.5 and twisted < 15.0, so these baslines are affected.

Suggesting pip install -U scrapy is ok, but not always - this will upgrade requirements like pyOpenSSL or cryptography or lxml, and installation could fail (compiling may require too much RAM, or build dependencies may be absent). It may also fail after installation, at runtime - I recall upgrading scrapy this way on Ubuntu 14.04 without using virtualenv (with pip3 install --user); installation was successful, but then cryptography failed to load, seemingly because pyOpenSSL was not able to use OpenSSL version installed on Ubuntu 14.04.

kmike commented 7 years ago

What do you think about providing scrapy[tls] extra? After bumping requirements to Twisted[tls] >= 15.0 we can make it no-op, and before that users can run pip install scrapy[tls]. I'm not sure it is possible to have the same package both in install_requires and in extra_requires, but with a different version and extras (twisted) - it needs to be checked.

redapple commented 7 years ago

I am not very fond of introducing a "tls" extra at Scrapy level as well, as I think it could be hard to explain that it does not mean TLS support ON or OFF, when to use it etc. It's just a shame we cannot says something like twisted<15,twisted[tls]>=15 in dependencies.

kmike commented 7 years ago

Fair enough. I'm fine with documenting this in FAQ, or maybe in a new Troubleshooting section in Install docs ("got AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' exception? This happens because Twisted dropped support for older pyOpenSSL versions. Either downgrade Twisted to ... or upgrade PyOpenSSL to 0.16+).

sunilsharma07 commented 7 years ago

Have tried to deal with a problem Just follow up step working fine pip install -U pip pip install --upgrade scrapy pip install --upgrade twisted pip install --upgrade pyopenssl

redapple commented 7 years ago

+1 for a new Troubleshooting section in Install docs. Could be hard to keep updated, but I believe we have some common cases in StackOverflow and here

jnikolak commented 7 years ago

Rhel 7/centos 7 works for me

pip install Twisted==16.4.1 Uninstalling Twisted-17.1.0: Successfully uninstalled Twisted-17.1.0

babyegern commented 7 years ago

@IAlwaysBeCoding you are a programming god. I just signed up to github, only to give a thumbs up. Your suggestion worked perfectly.

NatashaTing commented 6 years ago

I'm using Python 3.6.3 |Anaconda, Inc.| Scrapy 1.4.0 Twisted 16.4.1 (downgraded from 17.9.0) OpenSSL 17.4.0

when I run pip install twisted[tls] it shows Requirement already satisfied, but I'm still getting the AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' error when trying to run a spider..Anyone knows what to do?

EDIT: just thought I'd mention that I've also tried putting from OpenSSL import SSL in my main.py file .

originalix commented 6 years ago

@wzpan Cool! you solved my problem, Thanks

tokinonagare commented 6 years ago

In scrapy=1.5.0 still exist this problem, need install Twisted==16.4.1

diehummel commented 5 years ago

Hi, uninstall scrapy and twisted etc from pip2 and install it with pip3. It works with twisted 18.9, scrapy 1.6 for me with pip3.6 on centos. give it a try you maybe need to adjust the path (enironment) from /usr/bin to /usr/local/bin

Kunal614 commented 4 years ago

Hi, I'm trying to run scrapy from a script like this:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()
process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

However, when I run this script I get the following error:

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-
intel.egg/twisted/internet/_sslverify.py", line 38, in <module>
TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1,
AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1'

Does anyone know how to fix this? Thanks in advance.

Yes i think you url name create problem as i make some changes and , also uncomment the useragent from seeting.py , its working well

`import scrapy from scrapy.crawler import CrawlerProcess from ..items import BasicItem

class MySpider(scrapy.Spider): name = "basic" allowed_domains = ["web"] start_urls = ['https://www.rd.com/funny-stuff/short-jokes/']

def parse(self, response):
    item =  BasicItem()

    title = response.css('.listicle-h2').extract()
    item['title']=title

    yield item`

https://github.com/scrapy/scrapy/pull/4391

tainangao commented 3 years ago

@noprom The site does not complete the response when you use the default user agent (or the one you are using).

$ scrapy shell 'http://jbk.39.net/bw_t1/' --set USER_AGENT=Mozilla --loglevel INFO
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Scrapy 1.3.2 started (bot: scrapybot)
2017-03-02 09:38:49 [scrapy.utils.log] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'USER_AGENT': 'Mozilla', 'LOG_LEVEL': 'INFO'}
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.corestats.CoreStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-03-02 09:38:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-03-02 09:38:49 [scrapy.core.engine] INFO: Spider opened
2017-03-02 09:38:50 [traitlets] WARNING: Config option `pager` not recognized by `InteractiveShellEmbed`.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x109100d68>
[s]   item       {}
[s]   request    <GET http://jbk.39.net/bw_t1/>
[s]   response   <200 http://jbk.39.net/bw_t1/>
[s]   settings   <scrapy.settings.Settings object at 0x109100eb8>
[s]   spider     <DefaultSpider 'default' at 0x10bf23dd8>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response.body[:100]
b'\r\n<!doctype html>\r\n<html>\r\n<head>\r\n    <meta http-equiv="Content-Type" content="text/html; charset=g'

Then how do I use another user agent? I'm trying to scrape a real estate website where I'm just a guest. https://www.residentialpeople.com/za/property-for-sale/cape-town/?limit=10&offset=0&latitude=-33.9248685&longitude=18.4240553&radius=53.45541417432696&_location=Cape%20Town,%20South%20Africa&_radius_expansion=0.402

wRAR commented 2 years ago

I wonder if it's still needed with modern Scrapy?

Gallaecio commented 2 years ago

We actually covered this in the documentation as part of https://github.com/scrapy/scrapy/pull/3517

But now I wonder if we should remove that from the documentation now, if this were not needed nowadays.