scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 315 forks source link

error [SSL: CERTIFICATE_VERIFY_FAILED] on travel sites #109

Open ghost opened 6 years ago

ghost commented 6 years ago

Im just starting with this tool and im trying to scrape travel prices but i got error [SSL: CERTIFICATE_VERIFY_FAILED].

from scrapely import Scraper

s = Scraper()
url1 = 'XXXXX' # URL of site
data = {'price': '16.929'}
s.train(url1, data)

url2 = 'XXXXX' # URL of same site but different search params, same destination and origin just one month later
print(s.scrape(url2))

Full console log:

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect server_hostname=server_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket _context=self, _session=session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in init self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake self._sslobj.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "spider.py", line 6, in s.train(url1, data) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapely/init.py", line 48, in train page = url_to_page(url, encoding) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapely/htmlpage.py", line 183, in url_to_page fh = urlopen(url) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open response = self._open(req, data) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open '_open', req) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open context=self._context, check_hostname=self._check_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

Any idea what could be the problem?

stefano-bragaglia commented 6 years ago

Do you use Python v3.6?

I've seen something similar happening with package 'boto'... apparently some SSL certificates are missing or different than expected in v3.6...

I've resolved by downgrading to v3.5.*...

Hope it helps!

Ciao, Stefano

Il 6 apr 2018, 20:04 +0100, Marco Porracin notifications@github.com, ha scritto:

Im just starting with this tool and im trying to scrape travel prices but i got error [SSL: CERTIFICATE_VERIFY_FAILED]. from scrapely import Scraper

s = Scraper() url1 = 'XXXXX' # URL of site data = {'price': '16.929'} s.train(url1, data)

url2 = 'XXXXX' # URL of same site but different search params, same destination and origin just one month later print(s.scrape(url2)) Full console log:

Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect server_hostname=server_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket _context=self, _session=session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in init self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake self._sslobj.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "spider.py", line 6, in s.train(url1, data) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapely/init.py", line 48, in train page = url_to_page(url, encoding) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapely/htmlpage.py", line 183, in url_to_page fh = urlopen(url) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open response = self._open(req, data) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open '_open', req) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open context=self._context, check_hostname=self._check_hostname) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)> Any idea what could be the problem? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ghost commented 6 years ago

@stefano-bragaglia yes im using Python 3.6.1

I ve a 2.7 install two but im not being able to install scrapely on that version.

Could not find a version that satisfies the requirement scrapely (from versions: ) No matching distribution found for scrapely

stefano-bragaglia commented 6 years ago

Try Python 3.5!

Il 6 apr 2018, 20:20 +0100, Marco Porracin notifications@github.com, ha scritto:

@stefano-bragaglia yes im using Python 3.6.1 I ve a 2.7 install two but im not being able to install scrapely on that version.

Could not find a version that satisfies the requirement scrapely (from versions: ) No matching distribution found for scrapely — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.