scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
524 stars 94 forks source link

issue: Wrong previous jobs count in ZyteJobsComparisonMonitor #442

Closed curita closed 2 months ago

curita commented 2 months ago

Background

The method to get the previous jobs used in ZyteJobsComparisonMonitor can be seen here:

https://github.com/scrapinghub/spidermon/blob/a7e195f951a4b9a4837943dff0b2ae727b76f237/spidermon/contrib/scrapy/monitors/monitors.py#L557-L579

where number_of_jobs's value is defined by the SPIDERMON_JOBS_COMPARISON setting.

Issue

Supposedly, we should get the SPIDERMON_JOBS_COMPARISON number of previous jobs to compute the average, as stated in the docs, but we might get more because of how those jobs are iterated.

Debugging

For example, if some spider has 3500 jobs that meet the requested criteria and SPIDERMON_JOBS_COMPARISON is set to 10, then this will happen:

Resulting in 40 previous jobs, when only ten were requested.

Expected Behaviour

Only SPIDERMON_JOBS_COMPARISON number of jobs should be fetched as previous jobs, and those should be the latest ones that meet the requested criteria.