Open wfdd opened 5 years ago
The error returned by Python 3.6 when attempting to urlopen('https://ina.gl/inatsisartut/sammensaetning-af-inatsisartut/')
is ssl.SSLError: [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:841)
.
Last time a similar problem happened, it was an issue with MITM proxy: https://help.morph.io/t/certificate-verify-failed/338
I wonder if it is related again?
Similar to #1201
I've created a very small test "scraper" that doesn't actually scrape, it just checks that mitmproxy is returning a certificate that the system is trusting.
https://morph.io/jamezpolley/ssl_test
Decoded, the CA cert there is:
Issuer: CN = mitmproxy, O = mitmproxy
Validity
Not Before: Mar 30 19:35:55 2018 GMT
Not After : Mar 31 19:35:55 2021 GMT
Subject: CN = mitmproxy, O = mitmproxy
and the certificate for www.yahoo.com is:
Issuer: CN = mitmproxy, O = mitmproxy
Validity
Not Before: Jan 29 05:48:08 2019 GMT
Not After : Jan 30 05:48:08 2024 GMT
Subject: CN = *.www.yahoo.com
X509v3 extensions:
X509v3 Subject Alternative Name:
DNS:*.www.yahoo.com, DNS:add.my.yahoo.com, DNS:*.amp.yimg.com, DNS:au.yahoo.com, DNS:be.yahoo.com, DNS:br.yahoo.com, DNS:ca.my.yahoo.com, DNS:ca.rogers.yahoo.com, DNS:ca.yahoo.com, DNS:ddl.fp.yahoo.com, DNS:de.yahoo.com, DNS:en-maktoob.yahoo.com, DNS:espanol.yahoo.com, DNS:es.yahoo.com, DNS:fr-be.yahoo.com, DNS:fr-ca.rogers.yahoo.com, DNS:frontier.yahoo.com, DNS:fr.yahoo.com, DNS:gr.yahoo.com, DNS:hk.yahoo.com, DNS:hsrd.yahoo.com, DNS:ideanetsetter.yahoo.com, DNS:id.yahoo.com, DNS:ie.yahoo.com, DNS:in.yahoo.com, DNS:it.yahoo.com, DNS:maktoob.yahoo.com, DNS:malaysia.yahoo.com, DNS:mbp.yimg.com, DNS:my.yahoo.com, DNS:nz.yahoo.com, DNS:ph.yahoo.com, DNS:qc.yahoo.com, DNS:ro.yahoo.com, DNS:se.yahoo.com, DNS:sg.yahoo.com, DNS:tw.yahoo.com, DNS:uk.yahoo.com, DNS:us.yahoo.com, DNS:verizon.yahoo.com, DNS:vn.yahoo.com, DNS:www.yahoo.com, DNS:yahoo.com, DNS:za.yahoo.com, DNS:106.10.250.10
So I don't think it's related to https://help.morph.io/t/certificate-verify-failed/338 (because I'm seeing MITMProxy serving the certificate okay)
See #1202 for (misplaced) details on this issue, which is unrelated to the web driver (be it PhantomJS or Chrome).
@wfdd Did you mean #1201 ?
I did.
I am also getting the unknown protocol
error with my Ruby scraper https://morph.io/reitermarkus/heizoelpreise-oesterreich.
As was reported in [1] HTTPS requests either fail (as in the case of vanilla Python) or return the exact payload
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body></body></html>
(e.g. with Selenium, presumably because the status code is ignored and that's what the blank page was hardcoded to in headless Chrome). This appears to have been happening to inatsisartut-scraper for the past eight months (it slipped under my radar because it did not cause the scraper to fail).Possibly related: #1201