Closed aclum closed 3 weeks ago
Hi @JamesTessmer, all of the migrators — whether written for the nmdc-schema
schema or the berkeley-schema-fy24
schema — can be found in the berkeley-schema-fy24
repository; here: https://github.com/microbiomedata/berkeley-schema-fy24/tree/main/nmdc_schema/migrators
Hi @aclum , I have a question. There are a few places in a migrator where schema version numbers are indicated; for example, each migrator's name has the format migrator_from_{initial_schema_version}_to_{final_schema_version}.py
, and each migrator has a variable named _from_version
and a variable named _to_version
, etc. What are the "from version" and "to version" in this case? In other words, what schema versions will this migrator be used to migrate the database from and to?
@JamesTessmer, when the person writing a migrator doesn't know what the specific schema versions will be yet, I usually recommend that they either (a) make up some non-sensical versions (e.g. 0.0.0
) and then mention in the PR that they are placeholder versions that will be updated to match the eventual starting/ending schema versions that go along with the migrator; or (b) specify the starting version as the currently-released schema version and specify the ending version as some PR number (the number of the schema repository PR that introduced the relevant schema change).
Here's a (hypothetical) example:
migrator_from_10_3_0_to_PR123.py
The version numbers can remain as placeholders until the migrator is in a PR. In other words, they can remain as placeholder while writing and testing the migrator.
It will be 10.3.0 to whatever the version release at the end of June for nmdc-schema will be proposed. I propose 10.4.0 unless @turbomam objects.
Thanks, @aclum.
FYI @JamesTessmer, when writing the migrator, I recommend naming it migrator_from_10_3_0_to_10_4_0.py
and setting [its class variables] _from_version = "10.3.0"
and _to_version = "10.4.0"
. We can go back and edit those things during the PR review phase, if needed.
Added PR for this issue here: https://github.com/microbiomedata/nmdc-schema/pull/2059
@aclum @eecavanna What's the best way to test the migrator before marking the PR as ready for review?
Hi @JamesTessmer,
The test approach I consider to be the "lowest-hanging fruit" is to run the doctests. You can do that by running $ poetry run python -m doctest -v /path/to/the/migrator.py
.
make
target that can be used to run the doctests in all migration-related code (it's $ make migration-doctests
), but it can be difficult to spot error messages in its output due to it outputting a large quantity of messages (all of which are the same color). I use the more specific $ poetry run python -m doctest ...
command when working on a specific migrator.
We need a migrator which will search for NomAnalysisActivity which do not have a version appended, for each of those records it should update ID to append a .1 to the existing value in slot ID and move the existing value of ID to alternative_identifiers.
Example before:
example after:
Example migrators can be found https://github.com/microbiomedata/nmdc-schema/tree/main/nmdc_schema/migrators
Target completion for this is 6/17. This migrator is needed for the 6/24 release or the records will be invalid b/c that release will have more stringent pattern matches on IDs. cc @ssarrafan