Informed metabolomics and NOM search space
Due to the metabolome's highly complex and still largely undiscovered molecular composition, de novo
analysis strategies, often used in proteomics and genomics data workflows, are not feasible for mass
spectrometry-based metabolomics data annotation. Instead, metabolomics data workflows rely on
comparison of experimental data to analytical data collected on molecular standard compounds or to
predictive models of analytical features, such as molecular fragmentation patterns, fine isotopic
structures, and chromatographic retention time. Therefore, molecular reference databases, both
molecular standards-based and in-silico, are fundamental components that support metabolite
identification. However, selecting the molecular search space for both NOM and metabolomics is a
manual, empirical process that relies on knowledge of the sample and experiment parameters (e.g.,
sample extraction solvent, chromatography separation technique, etc.). Therefore, we will leverage
NMDC’s sample metadata and the addition of detailed sample preparation metadata to our schema (see
Development of metabolomics and metaproteomics standards), to automate the parameterization and
data reference selection for the NOM and metabolomics workflows (Milestone 2.18). This effort will
increase the accuracy of metabolite annotation and greatly reduce false positive assignments.
Informed metabolomics and NOM search space Due to the metabolome's highly complex and still largely undiscovered molecular composition, de novo analysis strategies, often used in proteomics and genomics data workflows, are not feasible for mass spectrometry-based metabolomics data annotation. Instead, metabolomics data workflows rely on comparison of experimental data to analytical data collected on molecular standard compounds or to predictive models of analytical features, such as molecular fragmentation patterns, fine isotopic structures, and chromatographic retention time. Therefore, molecular reference databases, both molecular standards-based and in-silico, are fundamental components that support metabolite identification. However, selecting the molecular search space for both NOM and metabolomics is a manual, empirical process that relies on knowledge of the sample and experiment parameters (e.g., sample extraction solvent, chromatography separation technique, etc.). Therefore, we will leverage NMDC’s sample metadata and the addition of detailed sample preparation metadata to our schema (see Development of metabolomics and metaproteomics standards), to automate the parameterization and data reference selection for the NOM and metabolomics workflows (Milestone 2.18). This effort will increase the accuracy of metabolite annotation and greatly reduce false positive assignments.
Page 33