microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

nmdc-runtime should use materalized_pattern version of json schema #552

Open aclum opened 2 weeks ago

aclum commented 2 weeks ago

I believe the runtime code is currently loading nmdc.schema.json, this does not have the regular expression patterns we need, those are in nmdc_materialized_patterns.schema.json which is also part of the pypi distribution.

We discovered this when debugging migrators for Berkeley schema and didn't get errors I was expecting.

Eric said that from nmdc_schema.nmdc_data import get_nmdc_jsonschema_dict under the hood fetches nmdc.schema.json so we need to figure out how instead to load nmdc_materialized_patterns.schema.json

cc @eecavanna @turbomam @shreddd

It would be great if this could be updated in the next few days so it can be tested next week for the june 2024 release on the 24th. cc @pkalita-lbl

eecavanna commented 2 weeks ago

nmdc-schema v11.* PyPI package versions newer than v11.0.0rc11 will allow consumers to specify which variant of the JSON Schema they want to load—either the default variant or the "materialized patterns" variant. This new capability was implemented via https://github.com/microbiomedata/berkeley-schema-fy24/pull/204.

eecavanna commented 2 weeks ago

Looks to me like the Runtime is already — at least to some extent — accessing the "materialized patterns" variant of the schema (pulling it directly from the nmdc-schema package's file tree instead of accessing it via the nmdc_schema.nmdc_data module).

https://github.com/microbiomedata/nmdc-runtime/blob/8de5ef16ed05ca3876538dc17ef5fc2de76282c2/nmdc_runtime/util.py#L103-L113