microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

provide generalize-able infrastructure for id-based migrations well as pattern-based migrations #1248

Open turbomam opened 9 months ago

turbomam commented 9 months ago

semi-related: id-based configs and data should go in one, standardized assets/misc file

eecavanna commented 9 months ago

I thought we were talking about _id-based migration ("underscore ID")—as in, the ID value generated by Mongo, which is database-specific and, if it even exists in a different database, might refer to a different document, which we might want to treat differently. I was uncomfortable with that type of migration because I thought it coupled the migrations too closely to the current production NMDC database, in its current state.

If we are talking about id-based migrations ("ID")—where those IDs are NMDC IDs—I'm more comfortable with that. Those id values are controlled by humans (I think) and may not be derived from any underlying information that the migration script could base its behavior upon.

eecavanna commented 9 months ago

For id-based migrations, I would like to have the migration process first validate that the documents present are exactly the ones that the author of the migration expected; and alert the user (or abort) if that is not the case.

The migration process could also re-validate the documents afterward, to find out whether any new documents were introduced since the above migration—and subsequent transformations—were performed (which is typically a period of a few seconds). If any were introduced, the migration process could alert the user.

If the Mongo migrations were being done in a Mongo transaction, I think that transaction could be rolled back in this case—but that is not how migrations are being done today.

eecavanna commented 9 months ago

Replying to myself:

I thought we were talking about _id-based migration ("underscore ID")—as in, the ID value generated by Mongo

We weren't. We were talking about NMDC IDs.

An example is in this PR commit: