microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937

Open pkalita-lbl opened 2 months ago

pkalita-lbl commented 2 months ago

Montana correctly pointed out in this comment that our implementation of long-read metagenomics was somewhat incomplete.

The changes implemented for https://github.com/microbiomedata/submission-schema/issues/168 added a new JgiMgLrInterface class. It reuses slots that are also used by the JgiMgInterface class. That makes sense from a pure LinkML perspective, but unfortunately it misses an important point about how submission data is brought into MongoDB where it adheres to nmdc-schema.

In the submission data one sample's metadata might be spread across multiple submission-schema class instances (e.g. a SoilInterface instance and a JgiMgInterface instance), linked together by the unique sample name. When going into Mongo those instances get collapsed into one instance of the nmdc-schema Biosample class. The issue is that if, in the submission-schema data, one sample has both an JgiMgInterface instance and a JgiMgLrInterface the slots values for one will overwrite the other when squashing into a Biosample instance.

This is the reason why we currently need to have pairs of slots like dna_absorb1 and rna_absorb1 instead of just absorb1. With the introduction of long-read MG metadata these need to become triples of slots (e.g. rna_absorb1, dna_absorb1, and -- new -- dna_lr_absorb1)

mslarae13 commented 2 months ago

Checking with Alicia if NMDC needs to store these slots. If so, which ones?

https://github.com/microbiomedata/issues/issues/413#issuecomment-2075464756

pkalita-lbl commented 1 month ago

Removing this from Sprint 35. Not adding to a future sprint right now because it sounds like we need further input before proceeding.

mslarae13 commented 2 weeks ago

Decision was made on 06/12 during the metadata meeting

From @aclum in https://github.com/microbiomedata/issues/issues/413#issuecomment-2096644352

would like to keep dna_isolate_meth and map it to a slot on NMDC's Extraction class.

We want to track dna_isolate_meth in NMDC, but this is the only slot. We need to:

POST BERK