rafyzg / scrapy-requests

Scrapy middleware to handle javascript pages using requests-html
MIT License
25 stars 1 forks source link

This event loop is already running #4

Closed justquick closed 2 years ago

justquick commented 2 years ago

Using the default config from the readme Im getting an asyncio error.

Python 3.9.10 Scrapy 2.5.1 scrapy-requests 0.2.0

Traceback (most recent call last):
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/scrapy/core/spidermw.py", line 56, in _evaluate_iterable
    for r in iterable:
  File "/Users/jquick/Projects/st-search/scrape/scrape/spiders/tableau.py", line 20, in parse
    page.html.render()
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 618, in run_until_complete
    self._check_running()
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 578, in _check_running
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running

I tried adding nest_asyncio and get a somewhat better(?) outcome

  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete
    return f.result()
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/futures.py", line 201, in result
    raise self._exception
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/tasks.py", line 256, in __step
    result = coro.send(None)
  File "/Users/jquick/.virtualenvs/st-search-SuqNoVD4/lib/python3.9/site-packages/requests_html.py", line 505, in _async_render
    page = await self.browser.newPage()
AttributeError: 'coroutine' object has no attribute 'newPage'
rafyzg commented 2 years ago

Hey Justin,

This is not a problem with scrapy-request pakcage, this is a problem with the way you called render function. See this issue: https://github.com/psf/requests-html/issues/330 You need to replace render function with arender I realize the README might be a bit misleading, I will update it accordingly. Thanks.