jschnurr / scrapyscript

Run a Scrapy spider programmatically from a script or a Celery task - no project required.
MIT License
121 stars 26 forks source link

twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed #23

Closed nemuihitojf closed 2 years ago

nemuihitojf commented 2 years ago

I can't run many jobs. please tell me what I need to do.

my code

import scrapy
from scrapyscript import Job, Processor

settings = scrapy.settings.Settings(values={"LOG_LEVEL": "WARNING"})
processor = Processor(settings=None)

class PythonSpider(scrapy.spiders.Spider):
    name = "myspider"

    def start_requests(self):
        yield scrapy.Request(self.url)

    def parse(self, response):
        return {"title": 0}

jobs = [Job(PythonSpider, url="http://www.python.org") for i in range(50)]

results = Processor().run(jobs)

print(results)

result

2022-05-29 15:37:39 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: scrapybot) 2022-05-29 15:37:39 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.4.0, Python 3.10.4 (main, May 25 2022, 00:14:12) [GCC 11.2.0], pyOpenSSL 22.0.0 (OpenSSL 3.0.3 3 May 2022), cryptography 37.0.2, Platform Linux-5.15.0-1008-raspi-aarch64-with-glibc2.35 2022-05-29 15:37:39 [scrapy.crawler] INFO: Overridden settings: {} 2022-05-29 15:37:39 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor 2022-05-29 15:37:40 [scrapy.extensions.telnet] INFO: Telnet Password: b16d3b0a35179414 2022-05-29 15:37:40 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats'] 2022-05-29 15:37:40 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2022-05-29 15:37:40 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2022-05-29 15:37:40 [scrapy.middleware] INFO: Enabled item pipelines: [] 2022-05-29 15:37:40 [scrapy.core.engine] INFO: Spider opened 2022-05-29 15:37:40 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2022-05-29 15:37:40 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2022-05-29 15:37:40 [scrapy.crawler] INFO: Overridden settings: {} Process Process-1: Traceback (most recent call last): File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/billiard/process.py", line 327, in _bootstrap self.run() File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/billiard/process.py", line 114, in run self._target(*self._args, *self._kwargs) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/scrapyscript/init.py", line 69, in _crawl self.crawler.crawl(req.spider, req.args, **req.kwargs) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/scrapy/crawler.py", line 205, in crawl crawler = self.create_crawler(crawler_or_spidercls) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/scrapy/crawler.py", line 238, in create_crawler return self._create_crawler(crawler_or_spidercls) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/scrapy/crawler.py", line 313, in _create_crawler return Crawler(spidercls, self.settings, init_reactor=True) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/scrapy/crawler.py", line 82, in init default.install() File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/twisted/internet/epollreactor.py", line 256, in install installReactor(p) File "/home/hamashou/.local/share/virtualenvs/test-5h1bg2cX/lib/python3.10/site-packages/twisted/internet/main.py", line 32, in installReactor raise error.ReactorAlreadyInstalledError("reactor already installed") twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed

thorncorona commented 2 years ago

You create Processor twice. You can only create it once per process.

deneirgits commented 2 years ago

Also faced this issue. I had to downgrade to scrapy==2.5.1