Add: Isotopic Grouper - Githubissues

dyrlund commented 8 years ago

I have two suggestions to improvements to the isotopic grouper module compared to the implementation in MZmine 2.

Instead of assuming that the feature table only contains one data file, it should be possible to run the isotopic grouper on feature tables with multiple samples.
Instead of removing isotopes, they should be grouped to the representative isotope using the Group ID column supported in the updated feature tables.

If we want to support removing of isotopes from the tables, then I suggest that we add an option to the row filter module which can remove anything but the main ion from a group.

What do you think?

tomas-pluskal commented 8 years ago

I agree with 1. Regarding 2., I think the removing option should be a checkbox parameter in the isotopic peak grouper module (internally, it can be implemented by running the MSDK row filter module).

dyrlund commented 8 years ago

In MZmine 2, the charge of a feature is calculated by selecting one feature (X) at a time and then calculating how many features (Y) have a mass within 1.0033 Da ± the m/z tolerance and RT tolerance. It is then assumed that all the Y features will have the same charge as the X feature. Is this assumption correct?

Or would it be better to initially calculate the charge of all features independently and then afterwards threat only features with identical charge, mass and RT within the tolerances, as isotopes?

tomas-pluskal commented 8 years ago

You cannot calculate the charge of all features and then decide the isotopes.. because the charge is calculated from the distance of the isotopes.

The way it works in MZmine 2 is the algorithms considers all possible charges and checks how many peaks would be matched as isotopes if the charge was X. Then it decides the charge as the one where the number of identified isotopes is the highest.

dyrlund commented 8 years ago

The issue I see is that the algorithm only checks all possible charges for the current feature, not for all the features it is trying to fit in the isotope pattern. Let me try to explain the issue I see with the current implementation with an example.

The lipid SQDG(21:1) can be matched to the mass 641.3566 for the ion [M+H]+ and the peptide ISSIQSIVPALEIANAHR can be matched to the mass 640.3618 for the ion [M+3H]3+.

The difference between the two values is 1.0033 + 0.0085. If a sorted list of peaks by descending height therefore looks like this: (1) 641.3566 (2) 640.3618 (3) 642.3599 (4) 640.0273 (5) 639.6929 (6) 640.6962

Then (1), (2) and (3) will get the charge 1+ and (4), (5) and (6) will get the charge 3+. The correct charge for (2) is however 3+. The result is that (1) and (4) are kept where it should actually be (1) and (2). This could be correctly identified if the charge of (2) was not assumed from the charge of (1).

tomas-pluskal commented 8 years ago

Well, that is true, but how do you think the algorithm should decide the charge of (2) ?

dyrlund commented 8 years ago

To calculate the correct charge of (2) I think we could do the following:

Find all possible isotope peaks for each possible charge state for a given feature
For each charge state, loop through all isotope peaks and perform the calculation above. If the isotope pattern with highest # and charge is equal for the feature in question and the current isotope, then accept the charge. Otherwise remove the isotope from the charge state for the feature.
Save the charge with the highest # isotopes to the feature and all the associated isotopes.

dyrlund commented 8 years ago

Should we consider splitting the current algorithm into two separate packages in MSDK? In MZmine, we can always implement it as one method.

Find the charge of a feature
Group features by isotopes

This will allow MSDK to handle different algorithms for identifying the charge of a feature. These could include:

Comparing the features in a list (= current method)
Using the raw MS data
Using the charge set by the instruments for MS2 data (this is available in mzML files)

By splitting the algorithm, we could allow the user to first identify the charge using the MS data. The features without a charge could then be identified by comparing the features in the list. Finally, the isotopes could be grouped.

tomas-pluskal commented 8 years ago

I agree we can split the process.

But I am afraid the algorithm you proposed might run into infinite loops. It might be worth implementing a previously published and tested algorithm, e.g. http://www.ncbi.nlm.nih.gov/pubmed/?term=9879360

Take a look at this (somewhat related) issue in OpenMS Github: https://github.com/OpenMS/OpenMS/issues/877

dyrlund commented 8 years ago

I think it is a good idea to implement a published and tested algorithm instead.

photocyte commented 8 years ago

Moved from redundant issue #109

Given how high-resolution accurate mass instruments are becoming more common, it would be nice if isotopes were handled more discretely in MZmine3. A couple ideas below:

MZmine2 currently picks isotopes in the isotope grouping module by looking in a parameter defined m/z range around the "average" neutron mass (at least that is my understanding of how the module works). In principle it is possible to look for exact isotopic differences (+15N neutron, +13C neutron etc.), which have subtle but resolvable differences in m/z on a HRAM instrument run at a "reasonable" mass resolution setting (e.g. 70,000). Ideally such isotopes, when linked to the molecular ion, would be annotated & queryable as the likely possible isotope. The set of possible isotopes or elements to look for should be definable by the user, as clearly all isotopes on the periodic table is a bit much, but restricting it to what is currently interesting to the user makes it manageable & informative. Ideally, relative abundances of discrete isotopes could be quickly computationally queried (e.g. what is the relative abundance of +15N, +3x 13C, as compared to the molecular ion), which would be very useful for stable isotope assisted metabolomics. Currently such an analysis of unusual deviations of isotopes such as in stable isotope assisted metabolomics can be preformed by not running the isotopic groping module & using the adduct search module to pick out particular isotopic peaks.

Local calculation/calibration of the mass accuracy for the isotope grouping. I haven't done careful analysis of how the mass accuracy of the mass difference between isotopes is over the small m/z range of the possible isotopes (e.g. if I expect a particular isotope to be 5.012351 m/z away from the precursor mass, how accurate is this mass difference). My expectation is that it is spot on, or it will be highly amenable to calibration given the small m/z range & defined isotopes (bootstrap calibration with common or unusual isotopes such as multiple 13C / or combinations with 15N)

Calibration of the divergence of isotopic ratios away from expected ratios. The relative abundances of specific isotopic as defined by chemical data sources are (I believe), based on what you might find in "rocks". Biology has some well described isotopic preferences (e.g. preference of Rubisco for C12 over C13/C14, which is the basis for C14 radioisotope dating), which may produce measurable and reproducible divergences from expected isotopic ratios as defined by chemical data sources. E.g. isotopes patterns should and may turn out to be measurably different for animal cells, yeasts, plants etc, which have greatly varying core metabolisms. If particular isotopic ions can be detected & annotated, then it should be possible to calibrate how much that particular divergences from the expected isotopic ratio (if stable isotopic tracing is not being performed which would lead to unpredictable deviations). This could lead to greater specificity with the "isotopic pattern scoring" with the chemical formula prediction. This of course should be flexible with the user as not all workflows rely on biological metabolites.

With respect to preexisting algorithms, I haven't seen such manipulations in other software. I can look through OpenMS and briefly search the literature to see if there is anything like this.

In the case of the calibrations (2 & 3), I'd imagine that the implementation would be workable with a variety of algorithms, if ions to calibrate against could be confidently annotated. In the case of 1) my imagination of the implementation is that where the module currently checks for peaks in a m/z defined window (+ 1 neutron), it would instead enumerate through discrete isotopes differences with a relatively small m/z window (+13C neutron, +15C neutron... +/- ~2ppm), and annotate the isotope with the appropriate identification. Then 2) might be able to do calibrate the m/z deviations of the isotopes finely, and a "isotopic gap filling module" could try and annotate missed peaks.

mzmine / old-mzmine3

Add: Isotopic Grouper #50