psf / requests-html

Pythonic HTML Parsing for Humans™
http://html.python-requests.org
MIT License
13.73k stars 978 forks source link

Error Downloading html.arender #384

Open alejandrohdo opened 4 years ago

alejandrohdo commented 4 years ago

tried to try like this

>>> async def get_pyclock():
...     r = await asession.get('https://pythonclock.org/')
...     await r.html.arender()
...     return r
...
>>> results = asession.run(get_pyclock, get_pyclock, get_pyclock)

and I get this error when trying to render the html, do you know why?

`[W:pyppeteer.chromium_downloader] start chromium download. Download may take a few minutes. Traceback (most recent call last): File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 472, in wrap_socket cnx.do_handshake() File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1915, in do_handshake self._raise_ssl_error(self._ssl, result) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1647, in _raise_ssl_error _raise_current_error() File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue raise exception_type(errors) OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen chunked=chunked) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 344, in _make_request self._validate_conn(conn) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 843, in _validate_conn conn.connect() File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connection.py", line 370, in connect ssl_context=context) File "/home/alejandro/envs/envscrapy/lib/python3.6/site-packages/urllib3/util/ssl.py", line 355, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 478, in wrap_socket raise ssl.SSLError('bad handshake: %r' % e) ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/requests_html.py", line 775, in run return [t.result() for t in done] File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/requests_html.py", line 775, in return [t.result() for t in done] File "", line 3, in get_pyclock File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/requests_html.py", line 615, in arender self.browser = await self.session.browser File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/requests_html.py", line 714, in browser self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/pyppeteer/launcher.py", line 311, in launch return await Launcher(options, kwargs).launch() File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/pyppeteer/launcher.py", line 125, in init download_chromium() File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/pyppeteer/chromium_downloader.py", line 136, in download_chromium extract_zip(download_zip(get_url()), DOWNLOADS_FOLDER / REVISION) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/pyppeteer/chromium_downloader.py", line 78, in download_zip data = http.request('GET', url, preload_content=False) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/request.py", line 68, in request urlopen_kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/request.py", line 89, in request_encode_url return self.urlopen(method, url, extra_kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/poolmanager.py", line 326, in urlopen response = conn.urlopen(method, u.request_uri, kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 670, in urlopen response_kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 670, in urlopen response_kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 670, in urlopen **response_kw) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen _stacktrace=sys.exc_info()[2]) File "/home/alejandro/envs/env_scrapy/lib/python3.6/site-packages/urllib3/util/retry.py", line 399, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Linux_x64/575458/chrome-linux.zip (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),)) `

oldani commented 4 years ago

@alejandrohdo can you share the version of requests-html and pyppeteer you're using?

Cally99 commented 4 years ago

@oldani who deleted my comment? This is a simple fix. I said what need to be done to remove the issue @alejandrohdo faced.

Here's your code

import pyppdf.patch_pyppeteer

asession = AsyncHTMLSession()
async def get_pyclock():
     r = await asession.get('https://pythonclock.org/')
     await r.html.arender()
     print(r)
results = asession.run(get_pyclock, get_pyclock, get_pyclock)

here's the pip freeze output. appdirs==1.4.3 appnope==0.1.0 attrs==19.3.0 Automat==20.2.0 backcall==0.1.0 beautifulsoup4==4.9.0 bs4==0.0.1 certifi==2020.4.5.1 cffi==1.14.0 chardet==3.0.4 click==7.1.2 constantly==15.1.0 cryptography==2.9.2 cssselect==1.1.0 decorator==4.4.2 fake-useragent==0.1.11 hyperlink==19.0.0 idna==2.9 incremental==17.5.0 ipykernel==5.2.1 ipython==7.14.0 ipython-genutils==0.2.0 jedi==0.17.0 jupyter-client==6.1.3 jupyter-core==4.6.3 litereval==0.0.11 lxml==4.5.0 parse==1.15.0 parsel==1.5.2 parso==0.7.0 pexpect==4.8.0 pickleshare==0.7.5 prompt-toolkit==3.0.5 Protego==0.1.16 ptyprocess==0.6.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.20 PyDispatcher==2.0.5 pyee==7.0.1 Pygments==2.6.1 PyHamcrest==2.0.2 pyOpenSSL==19.1.0 pyppdf==0.0.12 pyppeteer==0.0.25 pyquery==1.4.1 python-dateutil==2.8.1 pyzmq==19.0.0 queuelib==1.5.0 requests==2.23.0 requests-html==0.10.0 Scrapy==2.1.0 scrapy-splash==0.7.2 selenium==3.141.0 service-identity==18.1.0 six==1.14.0 soupsieve==2.0 tornado==6.0.4 tqdm==4.46.0 traitlets==4.3.3 Twisted==20.3.0 urllib3==1.25.9 w3lib==1.21.0 wcwidth==0.1.9 websockets==8.1 zope.interface==5.1.0

You probably fixed this since it's simple. @psf don't delete comments and create unnesssary issues

kursataktas commented 4 years ago

For those who want to solve this issue with workaround;

https://github.com/psf/requests-html/issues/325#issuecomment-629671988

ayse6060 commented 4 years ago

pyppeteer-install see: issue:399