scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
526 stars 94 forks source link

Add a script to check for jobs with close reason failed #401

Closed mrwbarg closed 1 year ago

mrwbarg commented 1 year ago

There are cases where jobs can fail abruptly in such a way that Spidermon (or any other extensions that run at the end of Scrapy) won’t run.

In these situations, we won’t be alerted that something happened because Spidermon didn’t run at the end, so it won’t generate alerts, and ScrapyCloud doesn’t warn about them.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: -0.85 :warning:

Comparison is base (44d5316) 76.54% compared to head (7aea780) 75.69%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #401 +/- ## ========================================== - Coverage 76.54% 75.69% -0.85% ========================================== Files 76 77 +1 Lines 3214 3250 +36 Branches 384 390 +6 ========================================== Hits 2460 2460 - Misses 683 719 +36 Partials 71 71 ``` | [Impacted Files](https://app.codecov.io/gh/scrapinghub/spidermon/pull/401?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapinghub) | Coverage Δ | | |---|---|---| | [spidermon/scripts/check\_failed\_jobs.py](https://app.codecov.io/gh/scrapinghub/spidermon/pull/401?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapinghub#diff-c3BpZGVybW9uL3NjcmlwdHMvY2hlY2tfZmFpbGVkX2pvYnMucHk=) | `0.00% <0.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

rennerocha commented 1 year ago

Even considering that we already have a few actions that are related to Scrapy Cloud, these are useful during the monitoring a spider executions.

The script you are proposing is not related to Spider monitoring, but to job execution inside Scrapy Cloud and nothing from Spidermon is used to execute it. As an extension, we should avoid to add coupling with specific spider running platforms.

I understand the problem, and it is important to monitor jobs that didn't even start! But Probably this script could be added in some internal helper library, inside Scrapy Cloud or in https://github.com/scrapinghub/python-scrapinghub, as this is the library used to interact with the platform. It doesn't seem something that we should add to the core of Spidermon.