microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
7 stars 3 forks source link

Modify GOLD ETL code to check existing records and only create missing records. #756

Open aclum opened 2 weeks ago

aclum commented 2 weeks ago

Is your feature request related to a problem? Please describe. We have studies that will be in a hybrid state where we have biosamles from submission portal to mongo ETL and we need to make corresponding data_generation_set records.

Describe the solution you'd like ETL code should check existing records for a study, determine what records are missing and make those. Example project is https://github.com/microbiomedata/issues/issues/813

Acceptance Criteria Updated GOLD ETL code that can be run to makes data_generation_set records for https://github.com/microbiomedata/issues/issues/813

mslarae13 commented 1 week ago

@aclum what is the "only create missing records" are there missing biosamples??? there shouldn't be :(

aclum commented 6 days ago

See the linked ticket #813, the use case there is we have biosample records (from the submission portal to mongo ETL) but no data_generation_set records. The code needs to look at existing records in nmdc's mongo and use those to set the appropriate has_input for new data_generation_set records. This will be the predominate use case going forward.