microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

Migrations: Implement notebook that runs all Berkeley schema migrators #519

Open eecavanna opened 1 month ago

eecavanna commented 1 month ago

Here's a link to the meta issue in which all the migrators are listed (in order): https://github.com/microbiomedata/nmdc-schema/issues/1607

Here's the path to the previous notebooks (each one only ran one migrator) in this repo:

demo/metadata_migration/notebooks

The sooner the "migrated" data is available (even in a non-production environment), the sooner people can start checking it. Also, it will facilitate the updating of dependent software.

eecavanna commented 1 month ago

I'm still working on this notebook. So far, I've implement a preliminary one that I've been using to run all the migrators in series (which is also what I expect the final version of the notebook to do).

I'm still unable to run them all, though. I think the nmdc-schema package, when imported directly from GitHub (not PyPI) doesn't include all the necessary files to support some functions it exposes—functions which some of the migrators use. Here's the latest error I am getting:

Image

I am waiting for a new version of the Berkeley schema flavor of the nmdc-schema package to be published to PyPI. I expect that to resolve the above issues. At that point, I think any errors I encounter will be specific to the migrators as opposed to the general structure of the nmdc-schema package.

eecavanna commented 1 month ago

A new migrator was implemented earlier today. Once it gets merged and included in a PyPI package, add it to this notebook.

eecavanna commented 1 month ago

I'll still be working on this next sprint. I will move it there already.