Closed sseveran closed 2 years ago
Is there a workaround for it?
I didn't find one. I built my own solution with docker, cron and some notebooks.
@sseveran I've just found one. In the scrapy code which is used by the scrapyd instance edit the file runner.py
(for me the path was /opt/virtualenv/lib/python3.8/site-packages/scrapyd/runner.py)
Just below all the existing import add this code:
from scrapy.utils.reactor import install_reactor install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
I'm not yet sure how the scrapers behave with that setting but so far I was able to deploy them, launch and scrape a few items
@inakrin to clarify, must this be done within the libraries source code, or can a custom runner be specified from scrapyd's default config:
@namiwa I wasn't aware of this setting so I haven't tried. But now I believe that using this setting is much better than editing the source code of the library. Thank you for the hint!
runner-based workaround implementation https://github.com/VitalyVen/scrapy-cookiecutter/commit/9ab0105bb3355a967f0a27012a8dd14d08928e77.
@inakrin happy to help! and thanks @VitalyVen for the possible implementation
Upon further inspection, it seems that scrapyd has a reactor
flag at launch, with the following options:
perhaps a complete approach would be run scrapyd --reactor=asyncio
:
along with the scrapyd.runner
override
@inakrin happy to help! and thanks @VitalyVen for the possible implementation
Upon further inspection, it seems that scrapyd has a
reactor
flag at launch, with the following options:perhaps a complete approach would be run
scrapyd --reactor=asyncio
:along with the
scrapyd.runner
override
It doesn't work. Although the log shows that it use twisted.internet.asyncioreactor.AsyncioSelectorReactor. When you request /schedule endpoint, the same error occurs. Does anyone have any solutions?
Hi @u23a, trying running my fork on https://github.com/namiwa/scrapyd-authenticated, which has a simple example AsyncIO Reactor-based spider
with docker-compose
Hope the above implementation helps!
@namiwa Thanks for your reply, I'll try it.
Currently scrapyd does not support spiders that use asyncio coroutines. When you upload the spider to scrapyd it fails with the following error. I didn't see a way to override the twisted reactor implementation in scrapyd.
The twisted application runner logs the default reactor for the platform when it starts up so I think that we would need a way to load a reactor before calling run. However I am not a twisted expert and that is just my guess having stepped through the code.