microbiomedata / nmdc_automation

Prototype automation
2 stars 2 forks source link

Bug interrupted update-study command results in incomplete update #115

Closed mbthornton-lbl closed 4 months ago

mbthornton-lbl commented 4 months ago

linkml-validate finds examples of metegenome omics_processing records that were left in their non re-IDed state, presumably due to losing response from the minter API while running update-study

Currently update-study fails if it does not find a study with the legacy study ID.
To fix this, update the command to search / update in the case of a partially updated Study tree.

Example of invalid omics_processing records:

[ERROR] [local/nmdc:sty-11-33fbta56.yaml/0] 'gold:Gb0150237' does not match '^nmdc:(bsm|procsm)-[0-9][a-z]{0,6}[0-9]-[A-Za-z0-9]{1,}(\\.[A-Za-z0-9]{1,})*(_[A-Za-z0-9_\\.-]+)?$' in /omics_processing_set/0/has_input/0
[ERROR] [local/nmdc:sty-11-33fbta56.yaml/0] 'nmdc:0df5ac2c9052a2b45cfd9578aaa562f7' does not match '^nmdc:dobj-[0-9][a-z]{0,6}[0-9]-[A-Za-z0-9]{1,}(\\.[A-Za-z0-9]{1,})*(_[A-Za-z0-9_\\.-]+)?$' in /omics_processing_set/0/has_output/0
[ERROR] [local/nmdc:sty-11-33fbta56.yaml/0] 'gold:Gs0110138' does not match '^nmdc:sty-[0-9][a-z]{0,6}[0-9]-[A-Za-z0-9]{1,}(\\.[A-Za-z0-9]{1,})*(_[A-Za-z0-9_\\.-]+)?$' in /omics_processing_set/0/part_of/0

Example omics_processing record is a Metagenome:

{
    "_id" : ObjectId("649b009773e824995934a08b"),
    "id" : "gold:Gp0208359",
    "name" : "Peatland microbial communities from Houghton, MN, USA - PEATcosm2014_Bin01_10_metaG",
    "description" : "Peatland microbial communities from PEATcosm Experiment in MTU Mesocosm Facility, Houghton",
    "has_input" : [
        "gold:Gb0150237"
    ],
    "part_of" : [
        "gold:Gs0110138"
    ],
    "add_date" : "2017-03-10",
    "mod_date" : "2020-04-05",
    "ncbi_project_name" : "Peatland microbial communities from Houghton, MN, USA - PEATcosm2014_Bin01_10_metaG",
    "omics_type" : {
        "has_raw_value" : "Metagenome"
    },
    "principal_investigator" : {
        "has_raw_value" : "Erik Lilleskov"
    },
    "processing_institution" : "JGI",
    "type" : "nmdc:OmicsProcessing",
    "has_output" : [
        "nmdc:0df5ac2c9052a2b45cfd9578aaa562f7"
    ],
    "gold_sequencing_project_identifiers" : [

    ]
}
mbthornton-lbl commented 4 months ago

Testing shows that the update-study command it now descending the tree by current or legacy identifiers. This does not solve the existing non-conforming OmiceProcessing records.

Running the data_qc query: omics_processing_has_input_biosample_or_processed_sample.js finds all 24 of these Omics Records, for example:

{
    "_id" : ObjectId("649b009773e824995934a094"),
    "id" : "gold:Gp0208365",
    "name" : "Peatland microbial communities from Houghton, MN, USA - PEATcosm2014_Bin05_10_metaG",
    "description" : "Peatland microbial communities from PEATcosm Experiment in MTU Mesocosm Facility, Houghton",
    "has_input" : [
        "gold:Gb0150243"
    ],
    "part_of" : [
        "gold:Gs0110138"
    ],
    "add_date" : "2017-03-10",
    "mod_date" : "2020-04-05",
    "ncbi_project_name" : "Peatland microbial communities from Houghton, MN, USA - PEATcosm2014_Bin05_10_metaG",
    "omics_type" : {
        "has_raw_value" : "Metagenome"
    },
    "principal_investigator" : {
        "has_raw_value" : "Erik Lilleskov"
    },
    "processing_institution" : "JGI",
    "type" : "nmdc:OmicsProcessing",
    "has_output" : [
        "nmdc:104c9da6da3a685e5b1b8a3b2652bdd7"
    ],
    "gold_sequencing_project_identifiers" : [

    ],
    "biosamples" : [

    ],
    "processed_samples" : [

    ]
}