Open jorainer opened 3 years ago
@jorainer I have highly accurate isotope distributions and background proportions if you need them and so code to simulate the isotope distribution given the sequence
@andreavicini is currently calculating distributions based on all chemical formulas of metabolites from HMDB (human metabolome database). On what did you calculate that?
Our approach is currently simpler than isotope distribution simulation - we're essentially looking for peaks with a difference in m/z that matches the expected difference for an isotope (e.g. C12, C13) allowing a user-defined ppm and checking that the intensity is lower than a certain threshold. Would you have a different idea to identify isotope peaks in a peak matrix (i.e. m/z values and intensities from one spectrum)?
So currently, I'm using, for example 12C and 13C that their masses are c(12.0000000, 13.0033548378)
and their proportions are prob = c(0.9893, 0.0107)
etc. I can then take any sequence and charge and simluate what the isotope distribution as a spectra looks like and then match the peaks within 2ppm error of each peak in the reference.
It looks like your use cases is slightly different, but thought I'd share in case its useful to discuss
OK, if I get you correctly, in your case the sequence (=chemical formula) and the charge is known beforehand. That's definitely also a good use case. Is that somewhat similar to what envipat
and Rdisop
are doing?
My use case at present is a completely unsupervised one, given that I have a spectrum, identify groups of peaks that could represent isotope peaks of a (yet unknown) compound.
Yep, exactly, mine is more simialr to envipat, just it returns a spectra object so its easier to use. Though, would also be cool in your unsupervised approach to be able to identify a glyco or phospho group (because that is unknown for us).
Thanks for clarification - look forward to the development!
Thanks @sgibb ! I completely forgot about that one!
Maybe you can get inspired here: https://github.com/RECETOX/recetox-xMSannotator/blob/main/xmsannotator/R/compute_isotopes.R
The rdkit chem library gives you the pattern, so with some spectral matching you could maybe identify those peaks.
Given a spectrum, find all sets of peaks that could represent isotope groups (e.g. C12, C13 peaks). This functionality could then be used e.g. in a
filterIsotopes
function or another function to extract just isotope peaks from aSpectra
(e.g. to pass it to functions to predict the formula based on the isotope pattern).