Closed necronet closed 1 year ago
Could you try with Scrapy 2.8?
Could you try with Scrapy 2.8?
I ran into the same issue when upgrading
Scrapy : 2.8.0
lxml : 4.6.3.0
libxml2 : 2.9.13
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.10.4 (main, Jan 16 2023, 23:43:37) [Clang 14.0.0 (clang-1400.0.29.202)]
pyOpenSSL : 22.0.0 (OpenSSL 3.0.5 5 Jul 2022)
cryptography : 37.0.4
Platform : macOS-13.1-arm64-arm-64bit
I could not reproduce the issue with a fresh install. Maybe you need to upgrade additional deps? (e.g. cryptography, pyOpenSSL)
$ rm -rf venv/
$ python3 -m venv venv
$ . venv/bin/activate
$ pip install scrapy
[…]
Successfully installed Automat-22.10.0 PyDispatcher-2.0.7 Twisted-22.10.0 attrs-22.2.0 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-3.0.1 constantly-15.1.0 cryptography-39.0.1 cssselect-1.2.0 filelock-3.9.0 hyperlink-21.0.0 idna-3.4 incremental-22.10.0 itemadapter-0.7.0 itemloaders-1.0.6 jmespath-1.0.1 lxml-4.9.2 packaging-23.0 parsel-1.7.0 protego-0.2.1 pyOpenSSL-23.0.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.21 queuelib-1.6.2 requests-2.28.2 requests-file-1.5.1 scrapy-2.8.0 service-identity-21.1.0 six-1.16.0 tldextract-3.4.0 typing-extensions-4.5.0 urllib3-1.26.14 w3lib-2.1.1 zope.interface-5.5.2
[…]
$ scrapy shell https://property-nicaragua.com/listing/casa-wahoo-above-the-surf/
[…]
2023-02-25 08:01:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://property-nicaragua.com/listing/casa-wahoo-above-the-surf/> (referer: None)
[…]
>>>
I could not reproduce the issue with a fresh install. Maybe you need to upgrade additional deps? (e.g. cryptography, pyOpenSSL)
Thanks for the follow-up @Gallaecio I did a fresh install, I'm using pyenv as version management, but the problem persists, as you can see bellow I even upgraded python version to get a completely fresh version of the packages.
Scrapy : 2.8.0
lxml : 4.9.2.0
libxml2 : 2.10.3
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.11.1 (main, Feb 26 2023, 11:38:14) [Clang 14.0.0 (clang-1400.0.29.202)]
pyOpenSSL : 22.0.0 (OpenSSL 3.0.5 5 Jul 2022)
cryptography : 37.0.4
Platform : macOS-13.1-arm64-arm-64bit
Here is the full stacktrace of the error
Traceback (most recent call last):
File "/Users/joseayerdis/.pyenv/versions/3.11.1/bin/scrapy", line 8, in <module>
sys.exit(execute())
^^^^^^^^^
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/cmdline.py", line 158, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/cmdline.py", line 111, in _run_print_help
func(*a, **kw)
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/cmdline.py", line 166, in _run_command
cmd.run(args, opts)
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/commands/shell.py", line 84, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/shell.py", line 44, in start
self.fetch(url, spider, redirect=redirect)
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/scrapy/shell.py", line 119, in fetch
response, spider = threads.blockingCallFromThread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/twisted/internet/threads.py", line 119, in blockingCallFromThread
result.raiseException()
File "/Users/joseayerdis/.pyenv/versions/3.11.1/lib/python3.11/site-packages/twisted/python/failure.py", line 475, in raiseException
raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]>]
Today I ran the command outside my project root, there's got to be a settings that is messing with the SSL handshake.
I'm going to go ahead and close this issue, whenever I figure what is making the crawler failed I'll let you know!
Thanks again for helping me figure this out
Edit 1:
Following up on this issue on StackOverflow.
I'm still facing this issue, not only with any specific url, but also with random urls.
cryptography 38.0.4
Scrapy 2.5.0
pyOpenSSL 22.0.0
i'm behind a proxy.
It only makes sense to reopen the ticket if you are facing the issue with the latest version of both Scrapy and deps, which is not the case.
Description
Hi I have been getting an error when trying to run
scrapy shell
on a site, unfortunately after trying to figure out I have failed to get at least the root cause of what is going on. Here is the error I haveSteps to Reproduce
scrapy shell https://property-nicaragua.com/listing/casa-wahoo-above-the-surf/
Expected behavior: Should return HTTP response with webpage data
Actual behavior:
<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', '', 'unexpected eof while reading')]
Reproduces how often: Everytime
Versions
Additional context
Other sites run correctly the issue arise only on this site, so it's possible that is an anti-crawler feature.