Closed turbomam closed 11 months ago
document how NMDC SubmissionPortal contents are converted into nmdc-schema objects and inserted into MongoDB
The code that does the translation is here: https://github.com/microbiomedata/nmdc-runtime/blob/46d6543339d2436524475a624644652d901a6517/nmdc_runtime/site/translation/submission_portal_translator.py. If you want to understand the particulars of what it does, the main "entry point" to that class is the get_database
method.
Once the nmdc:Database
object is prepared it is submitted to MongoDB via the /metadata/json:submit
API endpoint.
The process of fetching from the submission portal, translation, and submitting to MongoDB is orchestrated by a Dagster job (there's actually a second Dagster job that does the first two steps and then just a validate step, which is useful for testing).
I can add something to that effect to the runtime documentation. But I'd caution against documenting at the level of "this field from the submission gets capitalized and reversed and put in this field of the Study
object" because that will get stale and out of date so fast.
identify data patterns that might pass though that process but fail linkmml-validate against src/schema/nmdc.yaml
I don't know if what you're saying is actually possible. The data is validated against the schema at multiple points in the process before going into MongoDB. Regarding your example, as far as I can tell, the env_braod_scale
has a description that recommends using certain ENVO terms, but there's nothing in the schema that technically enforces that. I guess this should have been caught by a manual review of the submission up-front, but it wasn't in this case.
@pkalita-lbl can you please help me update or close this issue?
Maybe this should be rephrased or split into two issues. Possibly at least one of those could be immediately closed.
linkmml-validate
againstsrc/schema/nmdc.yaml
env_braod_scale