mobiusklein / ms_deisotope

A library for deisotoping and charge state deconvolution of complex mass spectra
https://mobiusklein.github.io/ms_deisotope
Apache License 2.0
33 stars 13 forks source link

Deisotope on ms peak picker peaklist #9

Closed Kawue closed 5 years ago

Kawue commented 5 years ago

Hello, I used your ms_peak_picker repo to pick some peaks on my metabolite data sets. The data origins from either MALDI-TOF or MALDI-Orbitrap instruments and is converted from imzML to HDF5 to a simple Pandas DataFrame. I used the picker on the mean and median spectrum. Is it possible to use your deconvolute_peaks method on the DataFrame or the picked peaklist? I tried a few things but I either got the error TypeError: __init__() got multiple values for argument 'use_subtraction' or my deconvoluted peak set is empty. Is it just not applicable to this type of data or do I have to use some special configurations?

mobiusklein commented 5 years ago

deconvolute_peaks should take any type of Iterable of FittedPeak objects, which is what pick_peaks returns (A C-extension type for fast sorted access to individual peaks), or an Iterable of pairs of (m/z, intensity) values. What were the precise types (and layouts of the DataFrame) you were passing it?

deconvolute_peaks may return an empty peak list if it cannot find any satisfactory isotopic patterns. What were the other parameters you passed to deconvolute_peaks? By default, deconvolute_peaks uses an averagine-based deconvolution strategy using Senko's peptide averagine [1], which may not be appropriate for your data. If you have a database of known compositions, it can uses those to search for exact isotopic patterns rather than rely on an linear extrapolation isotopic model.

Even given an isotopic pattern, the default scoring method assumes a certain intensity range, above 5e2, for reliable deconvolution, is that appropriate for your data? If not, a different scoring function can be used.

[1] Senko, M. W., Beu, S. C., & McLafferty, F. W. (1995). Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. Journal of the American Society for Mass Spectrometry, 6(4), 229–233. https://doi.org/10.1016/1044-0305(95)00017-8

Kawue commented 5 years ago

I tried the picked_peaks object, picked on the average spectrum and a list of (m/z, intensity) pairs. So the concrete DataFrame layout is independent on the input.

I only tried the default parameters, but thought already that the averagine strategy is not appropriated and therefore tried deconvoluter_type = PeakDependenceGraphDeconvoluterBase, which gave me the error stated above.

I do not have a database with known compositions. So I thought there might be a function which estimates the isotopic pattern in a data driven way. Or is it simply not possible to do this without prior knowledge? I am very inexperienced when it comes to this isotopic pattern topic and have no idea if your package is applicable to this type of problem.

After normalization my intensities are in a range of [0,10].

mobiusklein commented 5 years ago

Ah, yes, the normalization explains the problem. If you scale those intensities by 1e4/1e5, you should see something with the default settings. To defend against high noise, poorly centroided data, the deconvoluter ignores peaks with an intensity <= 5. The scoring functions appropriate for approximating isotopic patterns also use thresholds on the log or square root scale intensities, which would also not work well on that scale.

Short Answer

Scale up the intensities prior to picking peaks and the default method should work.

Long Answer

Isotopic pattern fitting tries to minimize the mismatch between the theoretical isotopic pattern and the experimental pattern while maximizing the amount of signal used. ms_deisotope provides several options for isotopic pattern fitting, both in isotopic pattern search strategy (the -Deconvoluter classes in ms_deisotope.deconvolution) and in evaluating pattern fitting (the -Fitter classes in ms_deisotope.scoring).

The averagine-based approach is "data-driven", but it assumes that all your molecules' elemental compositions change linearly in the same way as they grow in mass. This assumption holds for peptides, and to a limited extent for other classes of biomolecule (lipids, some glycans, nucleotides). Most of my work was on glycans and glycopeptides, where this technique works reasonably well most of the time. When there are several trends, you can specify a list of averagine models, and instead of using AveraginePeakDependenceGraphDeconvoluter you would use MultiAveraginePeakDependenceGraphDeconvoluter.

Metabolites are different because there is no "common pattern", so people usually search libraries of known molecules against their spectra. Common ones I've heard of are MassBank, NIST Spectral Libraries, and Metlin. If you use an averagine on them (after you fix the intensity scale issue), you'll get results, though the fits won't be as good as the known composition's, and if your metabolites' compositions have large proportions of elements like S, Na, Ca, K, Fe, or other metals relative to C, N, O, and H, it will require the isotopic pattern to be more intense to compensate. Because metabolites are all usually rather small, the choice of averagine doesn't matter that much.

Notes

Should you find yourself in possession of a list of compositions you want to deconvolute, you could use CompositionListPeakDependenceGraphDeconvoluter with the list of elemental compositions represented as mappings of element name to count, and it will fit those compositions exactly. This requires you also explicitly deal with adducts during deconvolution.

PeakDependenceGraphDeconvoluterBase is an abstract base class that provides methods for derived classes like AveraginePeakDependenceGraphDeconvoluter. That it complained about use_subtraction is a bit odd though.

Kawue commented 5 years ago

Thanks a lot for this very detailed answer. I fixed the intensity values and indeed got some results. I will check the quality of the results in the next days. Since all my questions are answered I will close this issue.