microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

document has_calibration in metabolomics analysis activity #304

Closed cmungall closed 2 months ago

cmungall commented 2 years ago

I'm comparing our schema to mzML, we have

  metabolomics analysis activity:
    is_a: workflow execution activity
    in_subset: 
      - workflow subset
    slot_usage:
      used:
        range: instrument
        multivalued: false
        description: >-
          The instrument used to collect the data used in the analysis
      has metabolite quantifications:
        range: metabolite quantification
        multivalued: true
      has calibration:
        description: >-
          TODO: Yuri to fill in

what goes in the calibration field?

currently the range of this field is the default range (string) - but it looks like it should be a non-inlined reference to another object?

Example data from API:

  "results": [
    {
      "type": "nmdc:MetabolomicsAnalysisActivity",
      "has_input": [
        "emsl:output_646802"
      ],
      "has_output": [
        "nmdc:c0f8177e881e53d2fd9305597be7a400"
      ],
      "id": "nmdc:8969f454c3944f1eac9da499fb950a18",
      "ended_at_time": "2021-01-08T11:51:33Z",
      "execution_resource": "EMSL-RZR",
      "git_url": "https://github.com/microbiomedata/metaMS",
      "has_calibration": "emsl:output_646437",
      "started_at_time": "2021-01-08T11:51:33Z",
      "used": "Agilent_GC_MS",
      "was_informed_by": "emsl:646802",
      "has_metabolite_quantifications": [
        {
          "highest_similarity_score": 0.55246881446802,
          "metabolite_quantified": "chebi:17724",
          "alternative_identifiers": [
            "kegg:C01026",
            "cas:1118-68-9"
          ]
        },

But we don't seem to have emsl:output_646437 in the database?

turbomam commented 9 months ago

I question whether emsl:output_646437, even if it were defined in our database, would satisfy the has_calibration description

A reference to a file that holds calibration information.

@corilo can you please explain how has_calibration should be used? Or can you suggest one of your team members for that task?

@aclum how does this scenario fit into your understanding of undefined mentions/dangling ids from the Napa re-id squad?

aclum commented 9 months ago

So far in the schema the emsl:output_* pattern is only in has_output slot for Class OmicsProcessing. There are some referential integrity issues here which will be addressed with the re-iding. A better example, one where the referenced DataObject exists is { "_id" : ObjectId("649b009773e824995934a065"), "id" : "emsl:771493", "name" : "EMSL_49991_Brodie_381_Lipids_Neg_14Aug19_Lola-WCSH417820", "description" : "High res MS with high res HCD MSn and low res CID MSn", "has_input" : [ "igsn:IEWFS001H" ], "has_output" : [ "emsl:output_771493" ], "part_of" : [ "gold:Gs0135149" ], "instrument_name" : "VOrbiETD04", "omics_type" : { "has_raw_value" : "Lipidomics" }, "processing_institution" : "EMSL", "type" : "nmdc:OmicsProcessing", "gold_sequencing_project_identifiers" : [

]

}

then the DataObject record is { "_id" : ObjectId("649b003c1ae706d7b5b14c5b"), "id" : "emsl:output_771493", "name" : "output: EMSL_49991_Brodie_381_Lipids_Neg_14Aug19_Lola-WCSH417820", "description" : "High res MS with high res HCD MSn and low res CID MSn", "file_size_bytes" : NumberInt(75696267), "type" : "nmdc:DataObject" }

turbomam commented 2 months ago

closing based on https://github.com/microbiomedata/berkeley-schema-fy24/pull/133#issuecomment-2087763687