Closed Kawue closed 5 years ago
deconvolute_peaks
should take any type of Iterable
of FittedPeak
objects, which is what pick_peaks
returns (A C-extension type for fast sorted access to individual peaks), or an Iterable
of pairs of (m/z, intensity)
values. What were the precise types (and layouts of the DataFrame
) you were passing it?
deconvolute_peaks
may return an empty peak list if it cannot find any satisfactory isotopic patterns. What were the other parameters you passed to deconvolute_peaks
? By default, deconvolute_peaks
uses an averagine-based deconvolution strategy using Senko's peptide averagine [1], which may not be appropriate for your data. If you have a database of known compositions, it can uses those to search for exact isotopic patterns rather than rely on an linear extrapolation isotopic model.
Even given an isotopic pattern, the default scoring method assumes a certain intensity range, above 5e2, for reliable deconvolution, is that appropriate for your data? If not, a different scoring function can be used.
[1] Senko, M. W., Beu, S. C., & McLafferty, F. W. (1995). Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. Journal of the American Society for Mass Spectrometry, 6(4), 229–233. https://doi.org/10.1016/1044-0305(95)00017-8
I tried the picked_peaks
object, picked on the average spectrum and a list of (m/z, intensity)
pairs. So the concrete DataFrame
layout is independent on the input.
I only tried the default parameters, but thought already that the averagine strategy is not appropriated and therefore tried deconvoluter_type = PeakDependenceGraphDeconvoluterBase
, which gave me the error stated above.
I do not have a database with known compositions. So I thought there might be a function which estimates the isotopic pattern in a data driven way. Or is it simply not possible to do this without prior knowledge? I am very inexperienced when it comes to this isotopic pattern topic and have no idea if your package is applicable to this type of problem.
After normalization my intensities are in a range of [0,10].
Ah, yes, the normalization explains the problem. If you scale those intensities by 1e4/1e5, you should see something with the default settings. To defend against high noise, poorly centroided data, the deconvoluter ignores peaks with an intensity <= 5. The scoring functions appropriate for approximating isotopic patterns also use thresholds on the log or square root scale intensities, which would also not work well on that scale.
Scale up the intensities prior to picking peaks and the default method should work.
Isotopic pattern fitting tries to minimize the mismatch between the theoretical isotopic pattern and the experimental pattern while maximizing the amount of signal used. ms_deisotope
provides several options for isotopic pattern fitting, both in isotopic pattern search strategy (the -Deconvoluter
classes in ms_deisotope.deconvolution
) and in evaluating pattern fitting (the -Fitter
classes in ms_deisotope.scoring
).
The averagine-based approach is "data-driven", but it assumes that all your molecules' elemental compositions change linearly in the same way as they grow in mass. This assumption holds for peptides, and to a limited extent for other classes of biomolecule (lipids, some glycans, nucleotides). Most of my work was on glycans and glycopeptides, where this technique works reasonably well most of the time. When there are several trends, you can specify a list of averagine models, and instead of using AveraginePeakDependenceGraphDeconvoluter
you would use MultiAveraginePeakDependenceGraphDeconvoluter
.
Metabolites are different because there is no "common pattern", so people usually search libraries of known molecules against their spectra. Common ones I've heard of are MassBank, NIST Spectral Libraries, and Metlin. If you use an averagine on them (after you fix the intensity scale issue), you'll get results, though the fits won't be as good as the known composition's, and if your metabolites' compositions have large proportions of elements like S, Na, Ca, K, Fe, or other metals relative to C, N, O, and H, it will require the isotopic pattern to be more intense to compensate. Because metabolites are all usually rather small, the choice of averagine doesn't matter that much.
Should you find yourself in possession of a list of compositions you want to deconvolute, you could use CompositionListPeakDependenceGraphDeconvoluter
with the list of elemental compositions represented as mappings of element name to count, and it will fit those compositions exactly. This requires you also explicitly deal with adducts during deconvolution.
PeakDependenceGraphDeconvoluterBase
is an abstract base class that provides methods for derived classes like AveraginePeakDependenceGraphDeconvoluter
. That it complained about use_subtraction
is a bit odd though.
Thanks a lot for this very detailed answer. I fixed the intensity values and indeed got some results. I will check the quality of the results in the next days. Since all my questions are answered I will close this issue.
Hello, I used your ms_peak_picker repo to pick some peaks on my metabolite data sets. The data origins from either MALDI-TOF or MALDI-Orbitrap instruments and is converted from imzML to HDF5 to a simple Pandas DataFrame. I used the picker on the mean and median spectrum. Is it possible to use your
deconvolute_peaks
method on the DataFrame or the picked peaklist? I tried a few things but I either got the errorTypeError: __init__() got multiple values for argument 'use_subtraction'
or my deconvoluted peak set is empty. Is it just not applicable to this type of data or do I have to use some special configurations?