Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata

pkalita-lbl commented 2 months ago

Montana correctly pointed out in this comment that our implementation of long-read metagenomics was somewhat incomplete.

The changes implemented for https://github.com/microbiomedata/submission-schema/issues/168 added a new JgiMgLrInterface class. It reuses slots that are also used by the JgiMgInterface class. That makes sense from a pure LinkML perspective, but unfortunately it misses an important point about how submission data is brought into MongoDB where it adheres to nmdc-schema.

In the submission data one sample's metadata might be spread across multiple submission-schema class instances (e.g. a SoilInterface instance and a JgiMgInterface instance), linked together by the unique sample name. When going into Mongo those instances get collapsed into one instance of the nmdc-schema Biosample class. The issue is that if, in the submission-schema data, one sample has both an JgiMgInterface instance and a JgiMgLrInterface the slots values for one will overwrite the other when squashing into a Biosample instance.

This is the reason why we currently need to have pairs of slots like dna_absorb1 and rna_absorb1 instead of just absorb1. With the introduction of long-read MG metadata these need to become triples of slots (e.g. rna_absorb1, dna_absorb1, and -- new -- dna_lr_absorb1)

mslarae13 commented 2 months ago

Checking with Alicia if NMDC needs to store these slots. If so, which ones?

https://github.com/microbiomedata/issues/issues/413#issuecomment-2075464756

pkalita-lbl commented 1 month ago

Removing this from Sprint 35. Not adding to a future sprint right now because it sounds like we need further input before proceeding.

mslarae13 commented 2 weeks ago

Decision was made on 06/12 during the metadata meeting

From @aclum in https://github.com/microbiomedata/issues/issues/413#issuecomment-2096644352

would like to keep dna_isolate_meth and map it to a slot on NMDC's Extraction class.

We want to track dna_isolate_meth in NMDC, but this is the only slot. We need to:

[ ] Add dna_isolate_meth_long & change dna_isolate_meth to dna_isolate_meth_short

POST BERK

[ ] map these 2 method slots to their correct berk-schema slot
- [ ] Update to an enum with controlled values if needed to map to https://microbiomedata.github.io/berkeley-schema-fy24/nucl_acid_ext/

microbiomedata / nmdc-schema

Add slots to Biosample class to disambiguate standard MG metadata vs long-read MG metadata #1937