psf / requests-html

Pythonic HTML Parsing for Humans™
http://html.python-requests.org
MIT License
13.73k stars 976 forks source link

Got a runtime error while render() #278

Open Areso opened 5 years ago

Areso commented 5 years ago
session = HTMLSession()
r = session.get(url)
r.html.render()
my = r.html.text

My Python version is 3.7.1 Sometimes it's first loop run falls in the error, sometimes it's about tenth. Anyway, it falls with the error:

Traceback (most recent call last):
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 512, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "C:\Software\Python\lib\site-packages\pyppeteer\page.py", line 862, in goto
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/git-projects/Stock-watcher/stock-watcher.py", line 177, in <module>
    main_loop(ld_params, ld_assets, ld_exchanges, ld_threshold_min, ld_threshold_max, ld_sms_counter_dict)
  File "C:/git-projects/Stock-watcher/stock-watcher.py", line 158, in main_loop
    price = parsing_tradingview(each_asset, loop_exchanges[myiterator])
  File "C:/git-projects/Stock-watcher/stock-watcher.py", line 114, in parsing_tradingview
    r.html.render()
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "C:\Software\Python\lib\asyncio\base_events.py", line 573, in run_until_complete
    return future.result()
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 537, in _async_render
    await page.close()
  File "C:\Software\Python\lib\site-packages\pyppeteer\page.py", line 1458, in close
    raise PageError('Protocol Error: Connectoin Closed. '
pyppeteer.errors.PageError: Protocol Error: Connectoin Closed. Most likely the page has been closed.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 262, in killChrome
    self._cleanup_tmp_user_data_dir()
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 154, in _cleanup_tmp_user_data_dir
    raise IOError('Unable to remove Temporary User Data')
OSError: Unable to remove Temporary User Data
bisguzar commented 5 years ago

Can you run your script with full rights? Like root or if you are using cmd you can try open terminal as administrator. And please share traceback again when you try.

Areso commented 5 years ago

Did it.

Future exception was never retrieved
future: <Future finished exception=NetworkError('Protocol error Target.sendMessageToTarget: Target closed.')>
pyppeteer.errors.NetworkError: Protocol error Target.sendMessageToTarget: Target closed.
Traceback (most recent call last):
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 512, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "C:\Software\Python\lib\site-packages\pyppeteer\page.py", line 862, in goto
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "stock-watcher.py", line 177, in <module>
    main_loop(ld_params, ld_assets, ld_exchanges, ld_threshold_min, ld_threshold_max, ld_sms_counter_dict)
  File "stock-watcher.py", line 158, in main_loop
    price = parsing_tradingview(each_asset, loop_exchanges[myiterator])
  File "stock-watcher.py", line 114, in parsing_tradingview
    r.html.render()
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "C:\Software\Python\lib\asyncio\base_events.py", line 573, in run_until_complete
    return future.result()
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 537, in _async_render
    await page.close()
  File "C:\Software\Python\lib\site-packages\pyppeteer\page.py", line 1458, in close
    raise PageError('Protocol Error: Connectoin Closed. '
pyppeteer.errors.PageError: Protocol Error: Connectoin Closed. Most likely the page has been closed.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 262, in killChrome
    self._cleanup_tmp_user_data_dir()
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 154, in _cleanup_tmp_user_data_dir
    raise IOError('Unable to remove Temporary User Data')
OSError: Unable to remove Temporary User Data
bisguzar commented 5 years ago

render() method timeout parameter (check it out from https://html.python-requests.org/). The default value is 8. Lets set it to a bigger one. Can you share the target? We can try by ourselves.

Maybe you can try this solution for OSError.

Areso commented 5 years ago

I did the following

    session = HTMLSession()
    r = session.get(url)
    r.html.render(timeout=30)
    my = r.html.text
    r.close()
    session.close()

Still ran as administrator. My target to render is https://www.tradingview.com/symbols/NASDAQ-TSLA

Traceback (most recent call last):
  File "stock-watcher.py", line 178, in <module>
    main_loop(ld_params, ld_assets, ld_exchanges, ld_threshold_min, ld_threshold_max, ld_sms_counter_dict)
  File "stock-watcher.py", line 159, in main_loop
    price = parsing_tradingview(each_asset, loop_exchanges[myiterator])
  File "stock-watcher.py", line 134, in parsing_tradingview
    session.close()
  File "C:\Software\Python\lib\site-packages\requests_html.py", line 736, in close
    self.loop.run_until_complete(self._browser.close())
  File "C:\Software\Python\lib\asyncio\base_events.py", line 573, in run_until_complete
    return future.result()
  File "C:\Software\Python\lib\site-packages\pyppeteer\browser.py", line 251, in close
    await self._closeCallback()  # Launcher.killChrome()
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 262, in killChrome
    self._cleanup_tmp_user_data_dir()
  File "C:\Software\Python\lib\site-packages\pyppeteer\launcher.py", line 154, in _cleanup_tmp_user_data_dir
    raise IOError('Unable to remove Temporary User Data')
OSError: Unable to remove Temporary User Data
Areso commented 5 years ago

How to find to which directory it downloaded Chromium? I wanna to try to delete it.

mrr010 commented 5 years ago

I come across similar issues from rendering http://global.krx.co.kr/contents/GLB/05/0503/0503050600/GLB0503050600.jsp I also got a window popup shows "Chromium has stopped working"

Traceback (most recent call last):
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 512, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\page.py", line 862, in goto
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/mrr010/PycharmProjects/Project_LearnPython/webscraping.py", line 14, in <module>
    r.html.render()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "C:\Users\mrr010\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 568, in run_until_complete
    return future.result()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 537, in _async_render
    await page.close()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\page.py", line 1458, in close
    raise PageError('Protocol Error: Connectoin Closed. '
pyppeteer.errors.PageError: Protocol Error: Connectoin Closed. Most likely the page has been closed.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\launcher.py", line 262, in killChrome
    self._cleanup_tmp_user_data_dir()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\launcher.py", line 154, in _cleanup_tmp_user_data_dir
    raise IOError('Unable to remove Temporary User Data')
OSError: Unable to remove Temporary User Data

then the second time i run the script, the error becomes this:

Traceback (most recent call last):
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 512, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\page.py", line 862, in goto
    raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 8000 ms exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/mrr010/PycharmProjects/Project_LearnPython/webscraping.py", line 14, in <module>
    r.html.render()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "C:\Users\mrr010\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 568, in run_until_complete
    return future.result()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\requests_html.py", line 537, in _async_render
    await page.close()
  File "C:\Users\mrr010\PycharmProjects\Project_LearnPython\venv\lib\site-packages\pyppeteer\page.py", line 1458, in close
    raise PageError('Protocol Error: Connectoin Closed. '
pyppeteer.errors.PageError: Protocol Error: Connectoin Closed. Most likely the page has been closed.
mrr010 commented 5 years ago

How to find to which directory it downloaded Chromium? I wanna to try to delete it.

C:\Users\Username\AppData\Local\pyppeteer\pyppeteer

i tried deleting the pyppeteer but the error still exisit

toskip commented 5 years ago

modify pyppeteer/laucher.py, line 102

        if 'headless' not in self.options or self.options.get('headless'):
            self.chrome_args.extend([
                '--headless',
                '--disable-gpu',
                '--hide-scrollbars',
                '--mute-audio',
                #add following two lines
                '--proxy-server="direct://"',
                '--proxy-bypass-list=*'
            ])

I found the solution in https://github.com/GoogleChrome/puppeteer/issues/2391 and it worked for me

collinsanele commented 5 years ago

There is definitely a problem with the render() function in requests_html. I always get permissionError: Access is denied. Other than that, it's a great library

razak17 commented 4 years ago

modify pyppeteer/laucher.py, line 102

        if 'headless' not in self.options or self.options.get('headless'):
            self.chrome_args.extend([
                '--headless',
                '--disable-gpu',
                '--hide-scrollbars',
                '--mute-audio',
                #add following two lines
                '--proxy-server="direct://"',
                '--proxy-bypass-list=*'
            ])

I found the solution in puppeteer/puppeteer#2391 and it worked for me

This worked for me too

PhilipWerz commented 4 years ago

modify pyppeteer/laucher.py, line 102

        if 'headless' not in self.options or self.options.get('headless'):
            self.chrome_args.extend([
                '--headless',
                '--disable-gpu',
                '--hide-scrollbars',
                '--mute-audio',
                #add following two lines
                '--proxy-server="direct://"',
                '--proxy-bypass-list=*'
            ])

I found the solution in puppeteer/puppeteer#2391 and it worked for me

I can't find this line in launcher.py... Any idea?