Closed povilasb closed 4 years ago
My hypothesis is that the server rejects TLS client hello because of some specified ciphers:
Cipher Suites (28 suites)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (0xc02c)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
Cipher Suite: TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (0x009f)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (0xcca9)
Cipher Suite: TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (0xcca8)
Cipher Suite: TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (0xccaa)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (0xc02b)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
Cipher Suite: TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x009e)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 (0xc024)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (0xc028)
Cipher Suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 (0x006b)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 (0xc023)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (0xc027)
Cipher Suite: TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 (0x0067)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA (0xc00a)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)
Cipher Suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x0039)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA (0xc009)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)
Cipher Suite: TLS_DHE_RSA_WITH_AES_128_CBC_SHA (0x0033)
Cipher Suite: TLS_RSA_WITH_AES_256_GCM_SHA384 (0x009d)
Cipher Suite: TLS_RSA_WITH_AES_128_GCM_SHA256 (0x009c)
Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA256 (0x003d)
Cipher Suite: TLS_RSA_WITH_AES_128_CBC_SHA256 (0x003c)
Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
Cipher Suite: TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
Cipher Suite: TLS_EMPTY_RENEGOTIATION_INFO_SCSV (0x00ff)
Wireshark displays me this response from the server:
TLSv1.2 Record Layer: Alert (Level: Fatal, Description: Handshake Failure)
Content Type: Alert (21)
Version: TLS 1.2 (0x0303)
Length: 2
Alert Message
Level: Fatal (2)
Description: Handshake Failure (40)
It comes immediately after TLS client hello message.
@redapple is the man who knows everything about such issues, but have you tried setting a different DOWNLOADER_CLIENT_TLS_METHOD option value?
Unfortunately, changing TLS version does not help.
I think you're on the right track with cipher suites. Did you compare ClientHello requests for success and failure cases? I cannot reproduce it with that URL but I have an older openssl. I'll try and use a more recent one tomorrow.
Le 25 avr. 2017 22:03, "Povilas Balciunas" notifications@github.com a écrit :
Unfortunately, changing TLS version does not help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scrapy/scrapy/issues/2717#issuecomment-297148497, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2GGE9OGAcs4VwCcJR3gcgt_vJO1Qepks5rzlGsgaJpZM4NHuwX .
How do you make scrapy/python to choose specific openssl version?
I haven't tried it yet myself but I believe you can use https://cryptography.io/en/latest/installation/#static-wheels
I was planning on using an Debian 9 Sid docker image.
Alright, I just tried https://github.com/scrapy/scrapy/issues/2717#issuecomment-297404774 and I was able to reproduce the issue:
$ scrapy version -v
Scrapy : 1.3.3
lxml : 3.7.3.0
libxml2 : 2.9.3
cssselect : 1.0.1
parsel : 1.1.0
w3lib : 1.17.0
Twisted : 17.1.0
Python : 2.7.12+ (default, Sep 17 2016, 12:08:02) - [GCC 6.2.0 20160914]
pyOpenSSL : 17.0.0 (OpenSSL 1.1.0e 16 Feb 2017)
Platform : Linux-4.8.0-49-generic-x86_64-with-Ubuntu-16.10-yakkety
$ cat testssl.py
import scrapy
class FailingSpider(scrapy.Spider):
name = 'Failing Spider'
start_urls = ['https://www.skelbiu.lt/']
def parse(self, response):
pass
$ scrapy runspider testssl.py
2017-04-26 15:45:18 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)
2017-04-26 15:45:18 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True}
(...)
2017-04-26 15:45:18 [scrapy.core.engine] INFO: Spider opened
2017-04-26 15:45:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-04-26 15:45:19 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-04-26 15:45:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.skelbiu.lt/> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>]
2017-04-26 15:45:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.skelbiu.lt/> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>]
2017-04-26 15:45:19 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.skelbiu.lt/> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>]
2017-04-26 15:45:19 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.skelbiu.lt/>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>]
2017-04-26 15:45:19 [scrapy.core.engine] INFO: Closing spider (finished)
2017-04-26 15:45:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3,
'downloader/exception_type_count/twisted.web._newclient.ResponseNeverReceived': 3,
'downloader/request_bytes': 636,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 4, 26, 13, 45, 19, 855881),
'log_count/DEBUG': 4,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2017, 4, 26, 13, 45, 19, 1654)}
2017-04-26 15:45:19 [scrapy.core.engine] INFO: Spider closed (finished)
For the record, I've collected .pcap files and expanded ClientHello message for Scrapy and OpenSSL client 1.0.2g and 1.1.0e in https://github.com/redapple/scrapy-issues/tree/master/2717
I'm leaning towards something to do with Elliptic Curves. I'll keep you updated.
Yeah, it looks like an EC thing:
-_defaultCurveName = u"prime256v1"
+_defaultCurveName = u"secp384r1"
made the connection to 'https://www.skelbiu.lt/' work for me.
Now, I'll have a look at how to properly configure this with Twisted Agent.
From what I see on https://www.ssllabs.com/ssltest/analyze.html?d=www.skelbiu.lt&s=92.62.130.22&hideResults=on, the website indeed requires (at least?) "secp384r1", which I tested in https://github.com/scrapy/scrapy/issues/2717#issuecomment-297440829
By default, openssl 1.1.0e client sends:
Elliptic curves (4 curves)
Elliptic curve: ecdh_x25519 (0x001d)
Elliptic curve: secp256r1 (0x0017)
Elliptic curve: secp521r1 (0x0019)
Elliptic curve: secp384r1 (0x0018)
but Scrapy1.3.3/Twisted 17.1 with OpenSSL 1.1.0e only sends:
Elliptic curves (1 curve)
Elliptic curve: secp256r1 (0x0017)
The code in Twisted using _defaultCurveName = u"prime256v1"
was added 3 years ago apparently. Maybe OpenSSL now uses the setting. I'm not sure.
A couple of (non-exclusive) options :
fyi, I've sent a message on Twisted Web mailing list: https://twistedmatrix.com/pipermail/twisted-web/2017-April/005293.html
I just tested with Twisted 17.5.0rc2 and this does NOT look fixed.
For me the issue is https://bugs.python.org/issue29697
The patch date is after all python stable versions and it causes the same error for urllib2.urlopen
for python 2.7 here. Applying the patch in that issue fixes it for me.
Twisted bug: https://twistedmatrix.com/trac/ticket/9210 (I had not opened it at the time)
I'm having the same issue with following versions:
Scrapy : 1.4.0
lxml : 3.8.0.0
libxml2 : 2.9.4
cssselect : 1.0.1
parsel : 1.2.0
w3lib : 1.17.0
Twisted : 17.5.0
Python : 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) - [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
pyOpenSSL : 17.1.0 (OpenSSL 1.1.0f 25 May 2017)
Platform : Darwin-16.6.0-x86_64-i386-64bit
Is there a workaround?
@werdlv , I don't know any workaround. Can you comment on which website is showing this failure? (to check if it's indeed related to OpenSSL 1.1 with Twisted)
@redapple sure. At least these are giving SSL error:
Here are some that are working without errors:
Thanks @werdlv . So it appears that https://www.skelbiu.lt/ and https://www.cvbankas.lt/ are served by the same machines 92.62.130.22 and 92.62.130.23. https://www.skelbiu.lt/ is the host in this very issue (https://github.com/scrapy/scrapy/issues/2717#issue-224196154)
also error site https://www.teplodvor.ru/
see also #2944
Right @tonal . https://www.teplodvor.ru/ does not look compatible with OpenSSL 1.1 (some weak ciphers were removed).
Downgrading to cryptography<2
, which ships with OpenSSL 1.0.2 (at least for me on Ubuntu), makes it work.
@redapple I have run pip install --upgrade 'cryptography<2', but not work
url: https://www.archdaily.com
Scrapy : 1.4.0 lxml : 4.1.1.0 libxml2 : 2.9.7 cssselect : 1.0.1 parsel : 1.2.0 w3lib : 1.18.0 Twisted : 17.9.0 Python : 3.6.3 (default, Oct 24 2017, 14:48:20) - [GCC 7.2.0] pyOpenSSL : 17.5.0 (OpenSSL 1.1.0g 2 Nov 2017) Platform : Linux-4.9.66-1-MANJARO-x86_64-with-arch-Manjaro-Linux
<GET https://www.archdaily.com> 2017-12-10 16:14:21 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.archdaily.com> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>] 2017-12-10 16:14:26 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.archdaily.com> (failed 2 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>] 2017-12-10 16:14:27 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.archdaily.com> (failed 3 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>] 2017-12-10 16:14:27 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.archdaily.com>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]>]
@sulangsss seems that you are still using OpenSSL 1.1.0.
pyOpenSSL : 17.5.0 (OpenSSL 1.1.0g 2 Nov 2017)
try to install OpenSSL == 1.0.x
I just installed Twisted==18.4.0rc1 and www.skelbiu.lt seem to work for me.
Closing since this has been fixed in Twisted 18.4.0.
I'm experiencing this in Ubuntu 18.04 (Twisted 17.9.0, OpenSSL 1.1.1). I cannot update to newer packages, but I do control my entire application. I've made this workaround in my main file, after imports:
from twisted.internet import _sslverify
def _raise(_):
raise NotImplementedError()
_sslverify._OpenSSLECCurve = _raise
This should probably be used only as a last resort if libraries cannot be updated.
I'm experiencing this in Ubuntu 18.04 (Twisted 17.9.0, OpenSSL 1.1.1). I cannot update to newer packages, but I do control my entire application. I've made this workaround in my main file, after imports:
from twisted.internet import _sslverify def _raise(_): raise NotImplementedError() _sslverify._OpenSSLECCurve = _raise
This should probably be used only as a last resort if libraries cannot be updated.
Its working for the version 1.4.0.
I have this simple spider:
On debian 9 it fails with:
On debian 8 it works well. And "https://www.skelbiu.lt" is the only target I can reproduce the problem.
Some more context:
Any ideas what I should look for? :)