scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.96k stars 569 forks source link

[runner.py] don't call get_application #411

Closed pawelmhm closed 2 years ago

pawelmhm commented 3 years ago

This is because get_application launches default Twisted reactor when called from runner, and we want to respect reactor settings from Scrapy. I also think there is some weird circularity in current code.

Fixes #377

How code in master branch works now

  1. twisted Application is launched with some reactor settings, get_application() called once
  2. server listens to requests
  3. scrapyd-client sends deploy request, POST to addversion endpoint comes
  4. webservice launches subprocess scrapyd.runner with argument list, to list Scrapy spiders
  5. this subprocess calls get_application() 2nd time, this time without twistd specific settings, so even if you pass asyncioreactor to twistd, this won't be respected, and it will launch default Twisted reactor (epoll on Linux)
  6. scrapy execute() is called, scrapy CrawlerProcess is trying to install reactor, but it realizes there is already default Twisted epoll reactor running, error is supressed
  7. scrapy raises Exception when it tries to verify that reactor from settings is equal to reactor running, exception: The installed reactor (twisted.internet.epollreactor.EPollReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

To fix this:

on point 5) instead of calling get_application we just get Eggstorage, without application object. I think there is no need for an application object in runner which is called from Application already as subprocess. If root process is creating subprocess I do not expect it to create another object that manages root process. So now it won't call application

TODO

pawelmhm commented 3 years ago

fixes https://github.com/scrapy/scrapyd/issues/377

pawelmhm commented 2 years ago

waiting for this: https://github.com/scrapy/scrapyd/pull/412 to be merged, after that I'll add unit test here

jpmckinney commented 2 years ago

Sounds good to me. I've done a quick review of #412. Let me know if anything else you need.

codecov[bot] commented 2 years ago

Codecov Report

Merging #411 (388953d) into master (2eb5409) will increase coverage by 0.03%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage   79.63%   79.67%   +0.03%     
==========================================
  Files          23       23              
  Lines        1051     1053       +2     
==========================================
+ Hits          837      839       +2     
  Misses        214      214              
Flag Coverage Δ
unittests 79.67% <100.00%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
scrapyd/runner.py 97.22% <100.00%> (+0.16%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2eb5409...388953d. Read the comment docs.

pawelmhm commented 2 years ago

Windows tests for 3.8+ are hanging for some reason, I tested on local windows laptop, and it worked fine, not sure what is it in environment here. I don't know if it runs this test or if it is just hanging there before starting tests

image

pawelmhm commented 2 years ago

removed windows, created issue for investigating this

pawelmhm commented 2 years ago

@jpmckinney so I'll integrate this PR with current master and merge. Then we can release 1.3 version, there is a lot of fresh code available. Next step in 2022 we can release 2.0 with this https://github.com/scrapy/scrapyd/pull/421, possibly also this https://github.com/scrapy/scrapyd/pull/304?

jpmckinney commented 2 years ago

@jpmckinney so I'll integrate this PR with current master and merge. Then we can release 1.3 version, there is a lot of fresh code available. Next step in 2022 we can release 2.0 with this #421, possibly also this #304?

Sounds good to me!