microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Expand raw data types (FileTypeEnum) to support EMSL workflows #1432

Closed aclum closed 8 months ago

aclum commented 9 months ago

Ticket to expand FileTypeEnum permissible values to accommodate making data objects for metaproteomics

aclum commented 9 months ago

metsproteomics data objects currently have no data_object_type example data_object_set id emsl:output_598852 lipidomics is the same story example data_object emsl:output_769754 metabolomics is the same organic matter characterization data objects use data_object_type "Direct Infusion FT ICR-MS Raw Data"

SamuelPurvine commented 9 months ago

So I would submit a data_object_type with value "LC-DDA-MS/MS Raw Data" with a description "Liquid chromatographically separated MS1 and Data-Dependent MS2 binary instrument file" which was wordsmithed with @pdpiehowski.

I think this might work for some metabolomics data as well as all proteomics data currently on the portal. I'll need to check with Yuri to see if this data_object_type would work to describe his LC-MS/MS metabolomics data (if there is any).

The DDA is there to differentiate from Data-Independent MS2 acquisitions which are likely coming to an experimental design near you.

Would this also be the place where we could/would place attributes of the mass spectra in the datafile that were set by the instrument method? These would include things like high resolution MS1 (HMS in EMSL speak) and high resolution MS2 (HMSn to allow for multiple levels of MS/MS) versus low resolution MS2 (MSn in our parlance), or the m/z ranges over which the data were collected 400m/z to 2000m/z, or the ionization used (MALDI (matrix assisted laser desorption ionization) versus ESI (electrosprray ionization)). There's... more ;0) Or, no, that has to wait for the Monterrey atomization, doesn't it?

SamuelPurvine commented 9 months ago

@aclum the lipidomics datafile you reference above, 769754, is an example of LC-DDA-MS/MS Raw Data, but oh look at that, it was collected in negative ESI mode, so we may need to add that to the datafile type? The above value assumes positive ionization mode which is typically done for proteomics. But both styles are LC-DDA-MS/MS, so maybe that would be placed down into the instrument metadata once we have such an animal in a few weeks, likely similar to the eratta mentioned above.

SamuelPurvine commented 9 months ago

and, if anyone cares, https://ontobee.org/ontology/CHMO?iri=http://purl.obolibrary.org/obo/CHMO_0000738 describes what I'm going on about tho it's part of a planned process

ssarrafan commented 9 months ago

Appears to be active. I'll move to the next sprint for completion. @aclum @SamuelPurvine

aclum commented 9 months ago

Pull request is ready for review.