levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
120 stars 38 forks source link

Correct precursor m/z accession ID? #160

Closed rcatzel closed 1 week ago

rcatzel commented 1 week ago

Hi there,

I am writing a method to read in an mzml file as part of a de novo peptide sequencing pipeline, and am using the controlled vocabulary functionality documented here. I was wondering what the correct precursor m/z accession number to use was as these "MS:1000040", "MS:1000827", "MS:1000744" seem to be similar to me, but I am new to the field. If they are interchangeable then I will allow any of these accession IDs to be mapped to the precursor m/z. Are there cases where these accession IDs are not interchangeable? If they are not interchangeable, which ID is most correct to use?

On a similar note, I am unsure of what accession ID to use for a peptide sequence, in the case that it is included in the mzml file. Is "MS:1000889" correct?

Thanks very much.

levitsky commented 1 week ago

Hi @rcatzel,

I can't say a whole lot for certain, but MS:1000040 appears to be basically a unit accession for everything measured in m/z units, and the other two are commonly seen present and with identical values. I don't know of specific use cases when there is a difference, perhaps @mobiusklein would have more insight.

mobiusklein commented 1 week ago

Hello,

Yes, there are many terms for referring to measurements with m/z units to deal with the different parts of a mass spectrum. The MS:1000744|selected ion m/z is the precursor ion's m/z, although it is not guaranteed to be the monoisotopic m/z for that ion, often it's just the most abundant peak in the precursor's isotopic pattern. The monoisotopic m/z is the one that, after conversion to neutral mass, will match the mass you'd calculate from the periodic table for organic compounds like most peptides. For peptides under 2,500 Da, you're usually safe, and newer instruments tend to be better about these things.

MS:1000827|isolation window target m/z is often the same because it is (usually) the center of the isolation window for precursor selection, but it's not required to be centered on the precursor ion, and sometimes it's not associated with an ion at all, just a preprogrammed coordinate.

To your second question, MS:1000889|peptidoform sequence is appropriate, although it doesn't tell someone what format you're using to write the peptide sequence plus its modifications down. A child term, MS:1003169|proforma peptidoform sequence, also tells the reader you're using the ProForma 2 notation, which many tools know how to parse.

rcatzel commented 1 week ago

Thanks!