Open skamensky opened 6 years ago
When asyncio.get_event_loop()
is called inside a thread which is not the main it raises this error. Do you need sessions to be unique per thread? If not just do this:
>>> from threading import Thread
>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> session.browser
>>> def render_html():
... r = session.get('http://python-requests.org/')
... r.html.render()
...
>>> t = Thread(target=render_html)
>>> t.start()
Otherwise, let me know and fix could be done to allow what you want.
That gets rid of the error and works as far as I can tell.
How does the package know which browser tab to parse when other threads are accessing the same session instance? Am I at risk of the the wrong virtual tabs/windows being parsed since by their nature threads could be switching virtual tabs at the same time? I had this issue when I was using a single instance of a virtual chrome browser using the selenium package.
Thanks for the tip!
Each time you call t.html.render
it creates a new browser page "tab", do everything you want (e.g: evaluate js) and close that tab "unless you want to interact with the page, then you pass keep_page=True
to render
. That behavior should keep each thread without interfering with another thread tab.
One suggestion is to keep the number of simultaneous threads low since each page represents a process in chrome and it will consume lots resources going hight.
I understand. So now my only question is: can we expect t.html.render to function properly if two separate threads open two tabs simultaneously and attempt to render the page in the virtual browser at the same time?
The reason I ask is because in selenium, you can only inject/execute javascript into a "tab" if the tab is active (i.e. selected) which means threads cannot inject/execute javascript into two tabs at the same moment.
I encountered with the same problem of RuntimeError: There is no current event loop in thread 'Thread-1'.
Tried the snipet of @oldani in cmd and its not working for me.
Using the latest python(3.6.5) and latest requests_html(0.9.0).
@eladbitton you forgot to run session.browser
, look closely at the code above.
However @skamensky I realize another issue that won't allow what you want to achieve related to the event loop, basically to allow this a new event loop needs to be created by each thread, this is what I was thinking for a fix even though this won't allow you to run too many threads before running out of resources (a fix like this will run a chromium process by thread). I will suggest you wait for #146 to be merged and do this asynchronous instead of with threads.
I'm thinking to make this possible and add a warning for not doing this unless you are willing to sacrifice resources.
I also have encountered the same problem - There is no current event loop in thread 'Thread-4
'. except mine is in Django app class. I can't render() function always raises an error. I've tried running render(keep_page=True)
and session.browser
with no success.
I'm running Django 2.0.3, Python 3.6.3, requests_html 0.9 and PyCharm Pro 2018.1. I'm using PyCharm's default virtual enviroment for Django.
I will add a fix for this
I have the same error, but it only happened when I'm using it inside of Django. if I run it locally will work. Do you have any ideas why?
Hi guys,
Yesterday we released v0.10.0
which now have full support for AsyncHTMLSession
you can use session instead of the normal one and won't have this kind of issue.
The issue around Django I have to investigate it yet, can any of you give me more context on it @cfournies @Commito ?
I got a similar error when starting multiple threads can you help? By the way you are doing great work @oldani class Loader: def init(self, user_agent=UserAgent, proxies=None, retries=RETRIES, rest=REST, opener=None, cache=None, headers=None, fast=False): self.user_agent = user_agent self.proxies = proxies self.retries = retries self.opener = opener self.cache = cache self.headers = headers self.session = Session() self.empty = set() self.queue = dict() self.base = None self.htmlsession = HTMLSession() self.htmlsession.browser
def ajaxload(self, url):
r = self.htmlsession.get(url)
r.html.render()
pac = dict()
pac['html'] = r.text
pac['code'] = r.status_code
print(r.url)
return pac
Hi @oldani I can help you with django error, let me know what you need. The code doesn't work when is use within django framework.
I think to know the key to the error here. The thing is the policy of the event loop, for this, we're going to have to create a new event loop per thread in this cases.
Hello @oldani
I have the same error using Flask, I've got RuntimeError: There is no current event loop in thread 'Thread-2'.
Happens when I use HTMLSession
and then call session.browser
inside a route or when I try to use AsyncHTMLSession
, both raise the error.
I'm not using threads or asyncio in my project, It's a simple Flask app with one route. Tell me if you want me to provides more logs/output/screens.
Hello @cfournies
You can run r.htm.render()
in django ?. I try so many way, but it's still exception There is no current event loop in thread
I have the same issue in my Flask application.
I have the same issue on Flask
@CarreyC It's ok when run by command.
I have the same issue now i use it in django ,when i add loop in django, i occured error no singal in main thread
Each time you call
t.html.render
it creates a new browser page "tab", do everything you want (e.g: evaluate js) and close that tab "unless you want to interact with the page, then you passkeep_page=True
torender
. That behavior should keep each thread without interfering with another thread tab.One suggestion is to keep the number of simultaneous threads low since each page represents a process in chrome and it will consume lots resources going hight.
Can you please suggest how can i use this in django framework ?
I found this on stackoverflow.
Here is my workaround with Flask.
from requests_html import AsyncHTMLSession
import asyncio
import pyppeteer
async def get_post() {
new_loop=asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)
session = AsyncHTMLSession()
browser = await pyppeteer.launch({
'ignoreHTTPSErrors':True,
'headless':True,
'handleSIGINT':False,
'handleSIGTERM':False,
'handleSIGHUP':False
})
session._browser = browser
resp_page = await session.get(your_query_url)
await resp_page.html.arender()
return resp_page
}
was there a fix with this issue?
Hello, just wondering... was this issue fixed ? so shall I just re-install the package?
I found this on stackoverflow.
Here is my workaround with Flask.
from requests_html import AsyncHTMLSession import asyncio import pyppeteer async def get_post() { new_loop=asyncio.new_event_loop() asyncio.set_event_loop(new_loop) session = AsyncHTMLSession() browser = await pyppeteer.launch({ 'ignoreHTTPSErrors':True, 'headless':True, 'handleSIGINT':False, 'handleSIGTERM':False, 'handleSIGHUP':False }) session._browser = browser resp_page = await session.get(your_query_url) await resp_page.html.arender() return resp_page }
@Têng Ûi may I know the full code on how you call this function? i still cannot make it work
its giving me this error
RuntimeError: Event loop is closed sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
I found this on stackoverflow.
Here is my workaround with Flask.
from requests_html import AsyncHTMLSession import asyncio import pyppeteer async def get_post() { new_loop=asyncio.new_event_loop() asyncio.set_event_loop(new_loop) session = AsyncHTMLSession() browser = await pyppeteer.launch({ 'ignoreHTTPSErrors':True, 'headless':True, 'handleSIGINT':False, 'handleSIGTERM':False, 'handleSIGHUP':False }) session._browser = browser resp_page = await session.get(your_query_url) await resp_page.html.arender() return resp_page }
its giving me this error
RuntimeError: Event loop is closed sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited
This is returning me with a coroutine object instead of html object. Did you possibly have that?
I found this on stackoverflow. Here is my workaround with Flask.
from requests_html import AsyncHTMLSession import asyncio import pyppeteer async def get_post() { new_loop=asyncio.new_event_loop() asyncio.set_event_loop(new_loop) session = AsyncHTMLSession() browser = await pyppeteer.launch({ 'ignoreHTTPSErrors':True, 'headless':True, 'handleSIGINT':False, 'handleSIGTERM':False, 'handleSIGHUP':False }) session._browser = browser resp_page = await session.get(your_query_url) await resp_page.html.arender() return resp_page }
@têng Ûi may I know the full code on how you call this function? i still cannot make it work
UPD: You probably need to do asyncio.run() on that function so you get the result. See if you haven't done that.
await resp_page.html.arender() never returns...
I found this on stackoverflow.
Here is my workaround with Flask.
from requests_html import AsyncHTMLSession import asyncio import pyppeteer async def get_post() { new_loop=asyncio.new_event_loop() asyncio.set_event_loop(new_loop) session = AsyncHTMLSession() browser = await pyppeteer.launch({ 'ignoreHTTPSErrors':True, 'headless':True, 'handleSIGINT':False, 'handleSIGTERM':False, 'handleSIGHUP':False }) session._browser = browser resp_page = await session.get(your_query_url) await resp_page.html.arender() return resp_page }
This worked for me! Make sure to run asyncio.run(get_post)
to get the result instead of coroutine
I'm having an issue calling the render function within a thread. It works perfectly for me outside of a thread but within a thread I get an error.
If this is truly a bug it should be reproducible using this snippet: