Closed avlm closed 2 years ago
Update here. I think I found the problem.
As documented on installing the asyncio reactor on scrapy docs, to use CrawlerRunner
we need to install the asyncio reactor manually using install_reactor(twisted.internet.asyncioreactor.AsyncioSelectorReactor)
.
So I went to scrapyrt code and added the try/except block to scrapyrt/core.py
from scrapy.utils.reactor import install_reactor, verify_installed_reactor
class ScrapyrtCrawlerProcess(CrawlerRunner):
def __init__(self, settings, scrapyrt_manager):
super(ScrapyrtCrawlerProcess, self).__init__(settings)
try:
verify_installed_reactor(settings.TWISTED_REACTOR)
except Exception:
install_reactor(settings.TWISTED_REACTOR)
self.scrapyrt_manager = scrapyrt_manager
verify_installed_reactor()
raises that exception I'm getting when making a request to my scrapyrt server.
But probably this code isn't in the right place, I tested it in the quotesbot and having the same error.
I'll keep trying, appreciate any help. Thanks.
paste this at top of cmdline.py
# -*- coding: utf-8 -*-
from configparser import (
ConfigParser, NoOptionError, NoSectionError
)
import argparse
import os
import sys
# asyncio reactor installation (CORRECT) - `reactor` must not be defined at this point
# https://docs.scrapy.org/en/latest/_modules/scrapy/utils/reactor.html?highlight=asyncio%20reactor#
import scrapy
import asyncio
from twisted.internet import asyncioreactor
scrapy.utils.reactor.install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
scrapy.utils.reactor.verify_installed_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
is_asyncio_reactor_installed = scrapy.utils.reactor.is_asyncio_reactor_installed()
print(f"Is asyncio reactor installed: {is_asyncio_reactor_installed}")
from twisted.internet import reactor
if you want to use playwright screenshot
, you need post
curl localhost:9081/crawl.json -d '{"request":{"url":"http://www.google.com/", "meta": { "playwright": "True", "playwright_context": "new", "playwright_include_page": "True",}}, "spider_name": "playwright"}'
hey @avlm thanks for posting this information. We didn't try to use ScrapyRT with other reactors, by default it uses twisted reactor to run Twisted web server, and Scrapy by default also uses Twisted reactor. To override this we'd have to adjust it somewhere here: https://github.com/scrapinghub/scrapyrt/blob/a3bf17f02297215a7fc5766f1f7e1b24d165562c/scrapyrt/cmdline.py#L93 so very early.
Scrapy goes into direction of allowing different reactors and I think ScrapyRT should allow it too, we would have to review what needs to be updated and respect this setting here: https://github.com/scrapy-plugins/scrapy-playwright#configuration TWISTED_REACTOR
we should probably use reactor that is installed in user settings instead of just assuming default.
I added "help wanted" tag to invite others to do more research into this and post their findings here, I'll do my research too and will post it later. Definitely supporting different reactors is something that would be welcome feature in future releases.
Hello! I'm using scrapyrt for about a month now and it works great, but today I added a new spider to project that uses scrapy-playwright project.
For this to work I had to change the default scrapy reactor to
twisted.internet.asyncioreactor.AsyncioSelectorReactor
, but when I run the scrapyrt server and make a request to run this spider it breaks with this error:Is it possible to make scrapyrt use the reactor specified in the scrapy settings?
[EDIT] Sorry, didn't pay enough attention to readme, so opening a new issue with the right label
Old issue #131