Open turbomam opened 9 months ago
I thought we were talking about _id
-based migration ("underscore ID")—as in, the ID value generated by Mongo, which is database-specific and, if it even exists in a different database, might refer to a different document, which we might want to treat differently. I was uncomfortable with that type of migration because I thought it coupled the migrations too closely to the current production NMDC database, in its current state.
If we are talking about id
-based migrations ("ID")—where those IDs are NMDC IDs—I'm more comfortable with that. Those id
values are controlled by humans (I think) and may not be derived from any underlying information that the migration script could base its behavior upon.
For id
-based migrations, I would like to have the migration process first validate that the documents present are exactly the ones that the author of the migration expected; and alert the user (or abort) if that is not the case.
The migration process could also re-validate the documents afterward, to find out whether any new documents were introduced since the above migration—and subsequent transformations—were performed (which is typically a period of a few seconds). If any were introduced, the migration process could alert the user.
If the Mongo migrations were being done in a Mongo transaction, I think that transaction could be rolled back in this case—but that is not how migrations are being done today.
Replying to myself:
I thought we were talking about _id-based migration ("underscore ID")—as in, the ID value generated by Mongo
We weren't. We were talking about NMDC IDs.
An example is in this PR commit:
semi-related: id-based configs and data should go in one, standardized assets/misc file