Re: #154 (will close the issue once a new version is released and integrated into nmdc-server)
Summary
Add new src/nmdc_submission_schema/datamodel/gold.py module with a function and CLI to extract GOLD ecosystem classification path terms from GOLD's JSON file and inject them as enum permissible values into a schema. Currently it makes 5 enums (one for each classification path level) representing all possible terms at each level and a second set of 5 enums that represent a reduced set of terms that NMDC has identified (through manual curation) as applicable to the soil template.
Update project.Makefile to download the GOLD JSON file and perform the enum injection as part of building the final src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml file.
The schema itself does not have any concept of enforcing valid pathway component combinations (it never did so nothing has changed in that respect). There is front-end code in nmdc-server which dynamically controls the dropdowns for the 5 ecosystem classification pathway columns to ensure you can only choose values that make valid combinations. It uses its own copy of the GOLD JSON file to drive that logic. Since we want to ensure that that JSON file agrees with the schema I've bundled it here (in project/thirdparty) and nmdc-server will pick up that version.
Remove the hardcoded enums in schemasheets/tsv_in/enums.tsv and update the slot ranges in sheets_and_friends/tsv_in/modifications_long.tsv to use the new enum names.
Comment
With this setup whenever we do a clean build (which should happen at least before each release) we'll get a new copy of the GOLD JSON file and build new enums based on it. That means that there isn't any explicit step to sync with GOLD; it should just happen transparently.
Re: #154 (will close the issue once a new version is released and integrated into
nmdc-server
)Summary
src/nmdc_submission_schema/datamodel/gold.py
module with a function and CLI to extract GOLD ecosystem classification path terms from GOLD's JSON file and inject them as enum permissible values into a schema. Currently it makes 5 enums (one for each classification path level) representing all possible terms at each level and a second set of 5 enums that represent a reduced set of terms that NMDC has identified (through manual curation) as applicable to the soil template.project.Makefile
to download the GOLD JSON file and perform the enum injection as part of building the finalsrc/nmdc_submission_schema/schema/nmdc_submission_schema.yaml
file.nmdc-server
which dynamically controls the dropdowns for the 5 ecosystem classification pathway columns to ensure you can only choose values that make valid combinations. It uses its own copy of the GOLD JSON file to drive that logic. Since we want to ensure that that JSON file agrees with the schema I've bundled it here (inproject/thirdparty
) andnmdc-server
will pick up that version.schemasheets/tsv_in/enums.tsv
and update the slot ranges insheets_and_friends/tsv_in/modifications_long.tsv
to use the new enum names.Comment
With this setup whenever we do a clean build (which should happen at least before each release) we'll get a new copy of the GOLD JSON file and build new enums based on it. That means that there isn't any explicit step to sync with GOLD; it should just happen transparently.
cc: @aclum @mslarae13