scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
526 stars 94 forks source link

Addition of SPIDERMON_MONITOR_SKIPPING_RULES #384

Closed shafiq-muhammad closed 12 months ago

shafiq-muhammad commented 1 year ago

Example of adding rule in the spider settings using stats value condition

"SPIDERMON_MONITOR_SKIPPING_RULES": {
            "Field Coverage Monitor": [["item_scraped_count", "==", 0]],
        }

Example of adding rule in the spider settings using custom function

"SPIDERMON_MONITOR_SKIPPING_RULES": {
            "Field Coverage Monitor": [skip_function],
        }

def skip_function(monitor):
    return "item_scraped_count" not in monitor.data.stats
curita commented 1 year ago

Hi @shafiq-muhammad! Wonder if it might be better to discuss the design in the ticket itself 🤔 We could agree on how this feature would look first and involve Spidermon maintainers, and then proceed with the implementation.

I visualize this feature in these two ways:

  1. For skip_if to be a decorator, which we can add over the monitors
    • This is a design preference, but I thought it would be nice for it to mimic the skip decorator from pytest.
  2. Rather than defining static rules over the stats, skip_if could ask for a function. The function could take the monitor class as a parameter (we have access to the stats, spider, etc., inside it) and return True or False. If it returns True, that monitor will be skipped, and when it returns False, it will run as usual.

But that's just what I'd like to see on my end. We should gather some more feedback to help us define this.

curita commented 1 year ago

I kept thinking about it, and now the external SPIDERMON_MONITOR_SKIPPING_RULES setting makes sense to me 🙏

The only thing I'd add is the support for user-defined functions for rules, but otherwise, I think we could go this route to add skip rules for any monitor we'd like without an explicit @skip_if decorator.

I left a comment in the original ticket for reference.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.18% :tada:

Comparison is base (e6509dd) 78.32% compared to head (89b903f) 78.51%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #384 +/- ## ========================================== + Coverage 78.32% 78.51% +0.18% ========================================== Files 73 73 Lines 3124 3151 +27 Branches 520 528 +8 ========================================== + Hits 2447 2474 +27 Misses 600 600 Partials 77 77 ``` | [Files Changed](https://app.codecov.io/gh/scrapinghub/spidermon/pull/384?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapinghub) | Coverage Δ | | |---|---|---| | [spidermon/contrib/scrapy/monitors/base.py](https://app.codecov.io/gh/scrapinghub/spidermon/pull/384?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapinghub#diff-c3BpZGVybW9uL2NvbnRyaWIvc2NyYXB5L21vbml0b3JzL2Jhc2UucHk=) | `95.00% <100.00%> (+2.69%)` | :arrow_up: | | [spidermon/contrib/scrapy/monitors/suites.py](https://app.codecov.io/gh/scrapinghub/spidermon/pull/384?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapinghub#diff-c3BpZGVybW9uL2NvbnRyaWIvc2NyYXB5L21vbml0b3JzL3N1aXRlcy5weQ==) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

curita commented 1 year ago

@shafiq-muhammad hi! I'm just jumping in regarding the tests we talked about last time. I'm not finding in Spidermon's codebase examples where we test the monitors + suites as intended, but we have a few examples in Zyte's codebase that might help here: https://codesearch.scrapinghub.com/source/search?q=&defs=&refs=&path=test_monitors.py&hist=&type=python [private].

It would be nice to add some tests to check the cases from the two examples in the docstring of these changes. The end goal would be to add tests so all the lines reported by codecov are covered. So we can add a suite that uses the coverage monitor, and add the "SPIDERMON_MONITOR_SKIPPING_RULES": {"Field Coverage Monitor": [["item_scraped_count", "==", 0]]} and "SPIDERMON_MONITOR_SKIPPING_RULES": {"Field Coverage Monitor": [skip_function]} as two test-cases. We could have a third control test case that shows what happens when SPIDERMON_MONITOR_SKIPPING_RULES isn't set.

Hope this helps, let us know if you have any doubts!

VMRuiz commented 1 year ago

@Gallaecio are we good to merge?