microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Create `ChromatographicSeparationConfiguration` class and modify existing `ChromatographicSeparation` to accommodate #1920

Closed kheal closed 2 months ago

kheal commented 3 months ago

Similar to the challenges laid out in PR https://github.com/microbiomedata/berkeley-schema-fy24/pull/130; LC- and GC-based omics workflows need more information regarding the chromatography aspect of the DataGeneration.

As currently modeled, each MassSpectrometry instance is connected to a FluidHandling instance through the eluent_introduction slot. For GC- and LC-based omics (metabolomics, lipidomics, and proteomics), the eluent_introduction slot corresponds to a ChromatographicSeparationProcess instance, which does not adequately model the ChromatographicSeparation processes (column specifications, gradient information, flow rate) for metabolomics/lipidomics metadata-informed workflow execution.

Similar to the MassSpectrometryConfiguration class, we propose to have a configuration class that houses the LC or GC configuration information (ChromatographyConfiguration) which would 1) greatly reduce repeating data on ChromatographicSeparationProcess instances and 2) allow for workflows to be configured based on ChromatographicConfiguration instance and 3) allow users to query samples that were run with similar configurations for downstream comparisons. This class would be associated with a MassSpectrometry instance via a new slot (has_chromatography_configuration).

An issue comes up because the current ChromatographicSeparationProcess class is currently set up to model both the LC/GC introduction aspect of a MassSpectrometry instance and solid phase extractions (see https://github.com/microbiomedata/berkeley-schema-fy24/blob/main/src/data/valid/ChromatographicSeparationProcess-SPE.yaml). Solid phase extractions (SPEs) are sample preparative steps and should not be associated with a Configuration as they are not controlled by a programmable and shared method file. Furthermore, the ChromatographicSeparationProcess class sits under the FluidHandling class, which is defined as "the process that defines how a processed sample is introduced into the mass spectrometer", which is not applicable to the solid phase extractions currently modeled by ChromatographicSeparationProcess.

Thus, we propose to change the inheritance of ChromatographicSeparationProcess to be directly from MaterialProcessing in oder to model preparative chromatographic processes (including SPE).

We would then remove the FluidHandling and DirectInfusion classes and instead use an eluent_introduction_categoroy slot with an Enum to designate the type of eluent introduction.

This results in a more accurate modeling for solid phase extractions, enables unambiguous use of a ChromatographyConfiguration class specifically for MassSpectrometry instances, and yields sufficient information regarding the LC/GC introduction for workflows to parameterize accordingly.

Current schema: Screenshot 2024-05-13 at 12 46 54 PM

Proposed changes: Screenshot 2024-05-14 at 12 22 27 PM

mslarae13 commented 2 months ago

Dependent upon https://github.com/microbiomedata/nmdc-schema/issues/1913 Onces 1913 is merged in, this will follow easily

kheal commented 2 months ago

Small note to fix Database-nucleic-extraction.yaml to use a protocol_link slot to capture extraction_method info which was depreciated per convo with @aclum

brynnz22 commented 2 months ago

@kheal This looks great to me! One question, are you adding new slots to the ChromatographicSeparationProcess. You mentioned column specifications, gradient information, flow rate. Will these be added?

kheal commented 2 months ago

Good question @brynnz22. I do not plan on adding new slots to the ChromatographyConfiguration class at this time, but abstracting that info out as a Config would make adding that information much easier later. For now, I am planning on populating the description slot on the configurations with a robust description that would include column specifications, gradient information, flow rate. Similar approach that we're taking for the MassSpectrometryConfiguration class.

kheal commented 2 months ago

After reviewing proposed changes with @corilo, revising original proposal to not change Extraction class, but rather change the inheritance and definition of the existing ChromatographicSeparationProcess. Configuration classes and MassSpectrometry classes are still good.