microbiomedata / nmdc-server

Data portal client and server for NMDC.
https://data.microbiomedata.org
Other
9 stars 0 forks source link

Data integrity action plan #765

Open jeffbaumes opened 1 year ago

jeffbaumes commented 1 year ago

Diagnosis. When recent submission table migrations occurred, there was a unsatisfiable constraint on some submission rows that required a user ID. At that point the migration did not succeed, but since it was in its own process, it just hung without reporting an error in a state where those rows were not migrated. Upon the next data ingest, the incomplete data was forcibly made durable by the A/B database switch. So the primary source of data loss was caused by a partial migration that did not send out any alert and failed silently.

Resolution. The resolution is to ensure mutations to the database can never be made durable without complete guarantee checks, which can be done by (1) putting the data changes into a single transaction which either completely succeed with all changes or fails with no changes, (2) alert us of any failure by stopping server startup on migration failure instead of silently letting the server startup, and (3) as a failsafe, automatically backup any data the data or submission portal authors (e.g. submissions table) before any data change.

The following specific tasks will be taken to accomplish the resolution plan:

There are a few related items that are not a direct part of the resolution but will help in data and state stability:

aclum commented 2 months ago

@jeffbaumes would you please update this ticket with how much is resolved vs outstanding.