Closed yolile closed 2 years ago
I think we can wait for 2.6.2, right?
We are also being affected by this other bug https://github.com/scrapy/scrapy/issues/5435 E.g.
scrapy pluck --release-pointer /date
Traceback (most recent call last):
File "kingfisher-collect/.ve/bin/scrapy", line 8, in <module>
sys.exit(execute())
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/cmdline.py", line 145, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/cmdline.py", line 153, in _run_command
cmd.run(args, opts)
File "kingfisher-collect/kingfisher_scrapy/commands/pluck.py", line 59, in run
release_pointer=opts.release_pointer, truncate=opts.truncate)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/crawler.py", line 205, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/crawler.py", line 238, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/crawler.py", line 313, in _create_crawler
return Crawler(spidercls, self.settings, init_reactor=True)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/scrapy/crawler.py", line 82, in __init__
default.install()
File "kingfisher-collect/.ve/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 246, in install
installReactor(p)
File "kingfisher-collect/.ve/lib/python3.6/site-packages/twisted/internet/main.py", line 32, in installReactor
raise error.ReactorAlreadyInstalledError("reactor already installed")
twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
I think we can wait for 2.6.2, right?
But as I run that command locally, I guess I can also change the dependency locally and wait until scrapy
2.6.2 is released,
Hmm, we are having an error in the registry scrapyd now:
2022-05-06T14:20:07+0000 [-] Process started: project='kingfisher' spider='chile_compra_api_releases' job='9efca674cd4711ecbf91a8a159689b50' pid=4270 log='/home/collect/scrapyd/logs/kingfisher/chile_compra_api_releases/9efca674cd4711ecbf91a8a159689b50.log' items=None
2022-05-06T14:20:07+0000 [Launcher,4270/stderr] /home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/utils/project.py:81: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: JOB, LOG_FILE, SLOT, SPIDER
warnings.warn(
2022-05-06T14:20:07+0000 [Launcher,4270/stderr] Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapyd/runner.py", line 40, in <module>
main()
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapyd/runner.py", line 37, in main
execute()
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/cmdline.py", line 145, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/cmdline.py", line 153, in _run_command
cmd.run(args, opts)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/commands/crawl.py", line 22, in run
crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/crawler.py", line 205, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/crawler.py", line 238, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/crawler.py", line 313, in _create_crawler
return Crawler(spidercls, self.settings, init_reactor=True)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/scrapy/crawler.py", line 82, in __init__
default.install()
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/twisted/internet/epollreactor.py", line 256, in install
installReactor(p)
File "/home/collect/scrapyd/.ve/lib/python3.8/site-packages/twisted/internet/main.py", line 32, in installReactor
raise error.ReactorAlreadyInstalledError("reactor already installed")
twisted.internet.error.ReactorAlreadyInstalledError: reactor already installed
2022-05-06T14:20:07+0000 [-] Process died: exitstatus=1 project='kingfisher' spider='chile_compra_api_releases' job='9efca674cd4711ecbf91a8a159689b50' pid=4270 log='/home/collect/scrapyd/logs/kingfisher/chile_compra_api_releases/9efca674cd4711ecbf91a8a159689b50.log' items=None
In the kingfisher server, scrapyd is working well, both are using scrapy 2.6.1, the difference is that kingfisher is using Twisted 20.3.0, and the registry Twisted 22.4.0 (the latest one)
Let's update the Registry's deployment then?
Do you mean as a temporary fix? Otherwise, we will need to change the requirements file each time we deploy the registry (or kingfisher)
I thought you were saying the problem is that Twisted
is old on the Registry. If so, we should update it (and use the same versions on both servers).
If you mean we need to downgrade Scrapy, then downgrade it on both servers until 2.6.2 is released.
In the kingfisher server, scrapyd is working well, both are using scrapy 2.6.1, the difference is that kingfisher is using Twisted 20.3.0, and the registry Twisted 22.4.0 (the latest one)
Nop, the registry is using the latest one (22.4.0) and kingfisher an older version (20.3.0)
Okay, can't we get them to both use the same one?
I upgraded to 22.4.0 in c97d8cbe6 because there was a security warning.
Okay, so the twisted.internet.error.ReactorAlreadyInstalledError
error is not related to Twisted, but to Scrapy 2.6.x.
We can either downgrade Scrapy to 2.5 (as I suggested), or we can use the HEAD from GitHub.
Noting that we also needed to upgrade Scrapyd to 1.3.0.
https://github.com/scrapy/scrapy/issues/5437 This bug is also affecting us, for example, when using the
sample
mode:scrapy crawl uruguay_releases -a sample=1
Although the data is downloaded, there is an annoying exception when the spider is closed: