rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Find all isotope peaks in a spectrum #185

Open jorainer opened 3 years ago

jorainer commented 3 years ago

Given a spectrum, find all sets of peaks that could represent isotope groups (e.g. C12, C13 peaks). This functionality could then be used e.g. in a filterIsotopes function or another function to extract just isotope peaks from a Spectra (e.g. to pass it to functions to predict the formula based on the isotope pattern).

ococrook commented 3 years ago

@jorainer I have highly accurate isotope distributions and background proportions if you need them and so code to simulate the isotope distribution given the sequence

jorainer commented 3 years ago

@andreavicini is currently calculating distributions based on all chemical formulas of metabolites from HMDB (human metabolome database). On what did you calculate that?

Our approach is currently simpler than isotope distribution simulation - we're essentially looking for peaks with a difference in m/z that matches the expected difference for an isotope (e.g. C12, C13) allowing a user-defined ppm and checking that the intensity is lower than a certain threshold. Would you have a different idea to identify isotope peaks in a peak matrix (i.e. m/z values and intensities from one spectrum)?

ococrook commented 3 years ago

So currently, I'm using, for example 12C and 13C that their masses are c(12.0000000, 13.0033548378) and their proportions are prob = c(0.9893, 0.0107) etc. I can then take any sequence and charge and simluate what the isotope distribution as a spectra looks like and then match the peaks within 2ppm error of each peak in the reference.

It looks like your use cases is slightly different, but thought I'd share in case its useful to discuss

jorainer commented 3 years ago

OK, if I get you correctly, in your case the sequence (=chemical formula) and the charge is known beforehand. That's definitely also a good use case. Is that somewhat similar to what envipat and Rdisop are doing?

My use case at present is a completely unsupervised one, given that I have a spectrum, identify groups of peaks that could represent isotope peaks of a (yet unknown) compound.

ococrook commented 3 years ago

Yep, exactly, mine is more simialr to envipat, just it returns a spectra object so its easier to use. Though, would also be cool in your unsupervised approach to be able to identify a glyco or phospho group (because that is unknown for us).

Thanks for clarification - look forward to the development!

sgibb commented 3 years ago

Similar issue: https://github.com/rformassspectrometry/MetaboCoreUtils/issues/10

jorainer commented 3 years ago

Thanks @sgibb ! I completely forgot about that one!

hechth commented 2 years ago

Maybe you can get inspired here: https://github.com/RECETOX/recetox-xMSannotator/blob/main/xmsannotator/R/compute_isotopes.R

The rdkit chem library gives you the pattern, so with some spectral matching you could maybe identify those peaks.