opendatateam / udata

Customizable and skinnable social platform dedicated to open data.
http://udata.readthedocs.org
GNU Affero General Public License v3.0
239 stars 87 forks source link

Notify on DBRef errors #3026

Closed ThibaudDauce closed 4 months ago

ThibaudDauce commented 6 months ago

Fix https://github.com/datagouv/data.gouv.fr/issues/478

The goal is to run the job every night to check for integrity errors inside the database. In case of new errors, we should investigate to find the root cause, fix it, then create a migration to fix the integrity problem.

ThibaudDauce commented 4 months ago

I am not entirely convinced about this PR :

* I don't think adding a new canal to manage errors is the best approach

Another approach would be to use sentry_sdk.capture_message("You caught me!") to report errors? So no new Mattermost channel and a central reporting place.

* I think using fix is risky as we would probably prefer an approach of understanding the issue and applying a dedicated migration for it

Ok to remove the --fix option

* I think we have only one case of error still in production ([this one](https://github.com/datagouv/data.gouv.fr/issues/478#issuecomment-2063745029)),

I would prefer an approach that fix this issue with a migration, as was done for other integrity issues (example). It would benefit all other udata users as well.

Will do it for this specific issue. Done in https://github.com/opendatateam/udata/pull/3057/

I also suggest, as discussed, that we improve Sentry to monitor AND/OR notify errors better, in particular on recurring errors.

Yes, we should improve the Sentry monitoring by reducing the number of false positive. We shouldn't receive mail for errors that we cannot de anything about them.

A simple way to get integrity errors could be to search for DBRef : example query

This approach doesn't warn you if nobody is accessing the page so the delay between the introduction of the integrity bug and the debugging to find out why it was added inside Mongo could be important. The point of having a cron every night is that we can correlate the new integrity problem with a recent deploy or a special manual API request done, etc.