Closed mslarae13 closed 4 months ago
@kheal
I'm looking at.
ionization_source:
range: IonizationSourceEnum
description: which kind of ionization source is used to introduce analytes into a mass spectrometer
I'm looking at https://www.ebi.ac.uk/ols4/ontologies/ms/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMS_1000008 & all the childran.
Please confirm if this is what you're looking for!
Also, if the children are correct for the enum.
Also, should athmospheric_pressure_chemical_ionization
be `atmospheric_pressure_chemical_ionization
Yes, that is a direct mapping. Yes, please fix that typo! Good catch
@kheal another one!
mass_analyzers:
range: MassAnalyzerEnum
description: which kind of mass analyzer(s) used during the spectra collection.
multivalued: true
Confirm?
Yes, another direct mapping. You are a much better ontology detective than I am.
resolution_categories:
range: ResolutionCategoryEnum
description: whether mass spectra are collected at higher than unit resolution
multivalued: true
This description reads weird to me... the enum has high & low where you can enter values.
Whether mass spectra sounds like a yes or no?
Is this meant to be
Or
Or neither?
@kheal :)
range: AcquisitionStrategyEnum
description: whether spectra are dependent on or independent of any other spectra during the experiment
Need to chat with Mark about the ontology structure for https://www.ebi.ac.uk/ols4/ontologies/ms/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMS_1003213?lang=en
@kheal this ontology term has many independent children types
data-independent acquisition (7) data independent acquisition from dissociation of full mass range data independent acquisition from dissociation of full mass range after ion mobility separation data independent acquisition from dissociation of scanning quadrupole across mass range data independent acquisition from dissociation of sequential mass ranges data independent acquisition from dissociation of sequential mass ranges after ion mobility separation
Does NMDC care about these? I assume just independent or dependent is sufficient?
re: resolution_categories
slot. I agree re: the word whether, that indicates a bool range. Since there are only two, we could swap to a bool and get rid of that Enum or we should modify the description. The slot and Enum were designed by Sam and Yuri so I didn't attempt to change it as part of the MassSpecConfig work, but I actually think it would be better as a bool. But I don't know if bool slots can be multivalued (both True and False, which we would need here)!
Neither of those mappings are direct, I think the first is an indirect mapping. Those mappings would require a float range, we are just trying to capture high or low within that range.
@kheal this ontology term has many independent children types
data-independent acquisition (7) data independent acquisition from dissociation of full mass range data independent acquisition from dissociation of full mass range after ion mobility separation data independent acquisition from dissociation of scanning quadrupole across mass range data independent acquisition from dissociation of sequential mass ranges data independent acquisition from dissociation of sequential mass ranges after ion mobility separation
Does NMDC care about these? I assume just independent or dependent is sufficient?
Those are important, but as far as I know unused with the current data in NMDC. The description data independent acquisition from dissociation of full mass range
is the "classic" DIA and would capture the most common type of DIA data we would get for proteomics, so I suggest that one as our description for the exisiting permissible value. The others we could add later as different permissible values if needed.
Update
eluent_introduction_category: description: A high-level description for how the processed sample is introduced into a mass spectrometer. to "high level categorization" from Mark
Update
has_mass_spectrometry_configuration: description: An identifier for the associated MassSpectrometryConfiguration.
to "The identifier of..."
Update
has_chromatography_configuration: description: An identifier for the associated ChromatographyConfiguration, providing information about how a sample was introduced into the mass spectrometer.
To I suggest "The identifier of..."
@turbomam how do I know what files are generated vs what I should edit for changing a slot?
Katherine has a typo, athomispheric should be atmospheric.
Do I need to update all files? Like enum_pv_results.tsv? Or just .yaml files?
Just the yaml, where the Enum is defined and the values are listed.
@turbomam
See https://github.com/microbiomedata/issues/issues/705#issuecomment-2125316993
Katherine has confirmed that our term mass_analyzers
has an exact mapping to https://www.ebi.ac.uk/ols4/ontologies/ms/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMS_1000443
The mass analyzers listed under this term match what we have in MassAnalyzerEnum
and has more. Should we provide a way that says "this enum is a subset from MS_1000443 ? Or should we provide mappings for each of these enum permissible values as they match the ontology?
@kheal where is has_calibration
defined in the schema? https://github.com/microbiomedata/berkeley-schema-fy24/blame/705-metabolink-standard-definitions/src/schema/nmdc.yaml#:~:text=has_calibration-,%2D,has_calibration,-last%20month
@corilo @kheal
For acquisition_category
we have "whether only survey or tandem mass spectra are part of the experiment"
For it's range: AcquisitionCategoryEnum
we have
full_scan:
aliases:
- MS
tandem_mass_spectrum:
aliases:
-MSn
Suggested resolution for https://github.com/microbiomedata/issues/issues/705#issuecomment-2125365745
" AcquisitionStrategyEnum in nmdc berk schema for metabolink. I found this term in bioportal Which seems to fit for acquisition_strategy And it has children data-dependent and data-independent acquisition. I asked Katherine, and she said that the data dependent child fits, and for the children of data-independent acquisition "data independent acquisition from dissociation of full mass range is the "classic" DIA and would capture the most common type of DIA data we would get for proteomics, so I suggest that one as our description for the exisiting permissible value" I want to be sure I'm referencing the ontology correctly in our schema... 8:41 So... 8:44 I'd add MS:1003213 as an exact_mapping for acquisition_strategy .. I'd also like to update the description... "Mode of running a mass spectrometer method by which mass ranges are selected and possibly dissociated." seems better than what we currently have "whether spectra are dependent on or independent of any other spectra during the experiment" ... I hate whether in the description, makes me feel like it's T//F and not a value And for the enum AcquisitionStrategyEnum I can add the descriptions and a mapping for data_independent_acquisition == MS:1003227 & data_dependent_acquisition == MS:1003221 "
From @turbomam
" Is it possible that we would want to use the acquisition_strategy slot in other classes in the future? Do we have other classes that might acquire things? Would we need to document they strategy they used? I'm a little concerned about over-specializing slots with such generic names. One solution would be to make a ms_acquisition_strategy child slot and specialize it. Or just rename acquisition_strategy to ms_acquisition_strategy . "
Decision: Change acquisition_strategy
to acquisition_strategy_ms
& add the ontology mappings
@kheal @corilo @anastasiyaprymolenna @SamuelPurvine
@kheal where is
has_calibration
defined in the schema? https://github.com/microbiomedata/berkeley-schema-fy24/blame/705-metabolink-standard-definitions/src/schema/nmdc.yaml#:~:text=has_calibration-,%2D,has_calibration,-last%20month
has_calibration
is defined here: https://github.com/microbiomedata/berkeley-schema-fy24/blob/8a3f51b16c725000b06b5a5bc4b21824c2d48144/src/schema/workflow_execution_activity.yaml#L527 . We are aware this is an odd placement, but we need it in this yaml until ingest - see this PR for follow ups: https://github.com/microbiomedata/berkeley-schema-fy24/pull/133
Decision: Change
acquisition_strategy
toacquisition_strategy_ms
I have no problem changing acquisition_strategy
to be more specific and I see @turbomam 's point! I have a slight preference to spell out mass_spectrometry
bc I don't think ms
is a well known acronym.
If we move forward with this, we should update all these names:
acquisition_strategy
-> mass_spectrometry_acquisition_strategy
AcquisitionCategoryEnum
-> MassSpectrometryAcquisitionCategoryEnum
acquisition_category
-> mass_spectrometry_acquisition_category
AcquisitionStrategyEnum
-> MassSpectrometryAcquisitionStrategyEnum
Now is the time to do this bc we don't need a migration since these classes are not ingested yet.
@corilo, @SamuelPurvine et al agreed to collapse these slots for simplicity:
acquisition_category
slot and AcquisitionCategoryEnum
full scan only
as a permissible value to AcquisitionStrategyEnum
Resolves https://github.com/microbiomedata/issues/issues/705#issuecomment-2140081596 See https://github.com/microbiomedata/nmdc-schema/issues/2018
As part of metabolink2, @anastasiyaprymolenna , @brynnz22 , & @kheal make some new classes and slots or did some re-organizing. This issue is looking at these changes and determining if there's a standardized term that should be used