idea: Margin of error for coverage thresholds

curita commented 1 year ago

Background

FieldCoverageMonitor is configured via the SPIDERMON_FIELD_COVERAGE_RULES setting. This dictionary defines how much coverage we expect from each field (expressed as a float number between 0 and 1, 1 being 100% coverage).

These thresholds are sometimes too strict. For example, if we set up a field coverage of 95% (0.95) on a field, if the actual field coverage ends up at 94.9%, we'll get an alert from Spidermon. Those kinds of close coverage drops aren't usually an issue, and devs tend to ignore them, but they still trigger alerts that get mixed with more severe errors.

Proposal

We could allow a certain margin of error or tolerance over these thresholds. Say, for example, a ±5% allowance, which wouldn't trigger an alert for the case defined before.

This could also be configured via a general setting for all fields, though a field-level defined tolerance could exist.

Alternatives

We've discussed an alternative to solve this: to lower the existing thresholds. This should work, but in practice, sometimes it is hard to calculate precisely how much it should be reduced, and jobs with high coverage variability require constant updates. A new tolerance setting might need to be updated too from time to time but could give more flexibility.

Another solution could be to use past jobs to calculate the expected thresholds dynamically. For example, in one of our projects, we don't have fixed coverage thresholds, but thresholds for how much we allow the coverage to be reduced between jobs. However, this is an entirely different approach compared to FieldCoverageMonitor (it depends on ScrapyCloud, and FieldCoverageMonitor doesn't), so this could co-exist as another monitor.

Let me know what you think! We've been discussing this in the monitoring chapter at Zyte and thought this could be beneficial for a lot of existing projects /cc @Chandral @mrwbarg

Gallaecio commented 1 year ago

If the result will be the same, i.e. an alert, I would rather lower 0.95 to 0.9 than add a new setting that effectively does the same.

I would understand if we wanted to have different levels of alerts. If we want to make 0.9-0.94 a warning and 0.89 and lower an error, then a threshold makes sense to me. Although it could be implemented as an absolute value (e.g. 0.9) rather than a relative one (0.05).

VMRuiz commented 1 year ago

I would rather lower 0.95 to 0.9 than add a new setting that effectively does the same.

This is what I did on my jobs.

We've` discussed an alternative to solve this: to lower the existing thresholds. This should work, but in practice, sometimes it is hard to calculate precisely how much it should be reduced

You still have to calculate this even if you reduced it to a single variable.

and jobs with high coverage variability require constant updates. A new tolerance setting might need to be updated too from time to time but could give more flexibility.

I don't agree with this approach. If the job has a high variability, the lower tolerance value should be calculated before the job goes into production. If the variability starts after it was already in production, then the monitors needs to be triggered to alert of the change, so developers can evaluate the new situation and decide if the previous tolerances are still valid of if they need to figure out new ones.

Another solution could be to use past jobs to calculate the expected thresholds dynamically. For example, in one of our projects, we don't have fixed coverage thresholds, but thresholds for how much we allow the coverage to be reduced between jobs

This is a dangeours approach as it would allow for a slow degradation over time that is not picked by any monitor

curita commented 1 year ago

Thanks for the replies!

Maybe we could have different severity levels for those alerts, and developers could set how each severity level is forwarded to them. But I imagine that would require careful consideration for how to implement it so monitors can support them, as well as some analysis of which other cases could benefit from it. I like that idea though!

I wonder if we could approach this threshold problem another way? Maybe we could develop some tools to help calculate and set those thresholds more easily, as it seems to be a common problem among internal projects.

This is a dangeours approach as it would allow for a slow degradation over time that is not picked by any monitor

There are ways to mitigate this, like comparing against the average from different timeframes (last job, last month, last year). But it does indeed open the door to missing gradual degradations, but maybe that's a risk some projects might be fine to take. I wouldn't use that approach to replace the current coverage one, but maybe it's a viable alternative that could be offered in parallel to this one.

VMRuiz commented 3 months ago

I think this can be solved by using a custom monitor that inherits from the default coverage monitor. Different Monitor suites can take care of the different alerting levels.

I will close this issue as I don't think we actually need to change Spidermon core just for this, please reopen if you think otherwise.

scrapinghub / spidermon