Closed stefanks closed 6 years ago
It should also be possible to validate deconvolution results by comparing the spectrum to a theoretical isotopic distribution generated for the atomic composition of the molecule (protein in this case).
So another method in imzspectrum that takes in a molecule and outputs the confidence that it is present in the scan? What would this be useful for?
Thermo sometimes provides the charge state guess for some peaks, this knowledge could be leveraged to get masses as well.
We have done this before manually to validate intact protein identifications. Visually, it is clear in the few cases I've seen when a match is incorrect. We could use it to assign confidence in an intact MS (no fragmentation) identification. It might even be useful for calibrating intact files, since we could filter out low quality identifications.
We could also do rank analysis like FDR on proteoform identifications from MS1 only with a confidence score. We don't have a metric like that as it stands.
I see. This is a much easier task than deconvolution, since looking for a known mass is easier than looking for unknown masses.
A few weeks ago when I tried using the thermo charge state guess it didn't work for intact proteins (returned 0) - I don't know if it's because the peaks were so close together at higher charge states. We could test the confidence method on label free topdown data since we know what the right answer is... it should return a high score for the correct protein and a low score for others of a similar mass/diff sequence. For label-free I think some sort of filter step like this will be necessary becasue of the sheer volume of masses present
Moved my old attempt to mzLib https://github.com/smith-chem-wisc/mzLib/commit/ced4db7f04072a720d83053ffafbbffec5671edf
The Deconvolute method is in ThermoSpectrum, and it relies on charge guesses provided in Thermo raw files.
Removed that old attempt, there is new code sitting in a pull request https://github.com/smith-chem-wisc/mzLib/pull/183 What would be some good tests to validate the code? Let's think of a rigid validation way to test it, and if it passes I will include deconvolution in mzLib
Here are some thoughts on possible tests --
Generate a few theoretical isotopic distributions from a sample of ~100,000 molecules:
Checks:
Say you have 1e6 proteoforms A and 1e7 proteoforms B, injected in a single scan.
What are the intensities of the peaks in the mass spectrum? In theory? I know the m/z values for each isotope for each charge, but what about the intensities? We want to reconstruct the number of proteoforms using intensity measurements, right?
Since there are 10 times more proteoform B than A, the ratios of some aggregated intensity measurements should be 1:10. Is it the ratio of the sum of all relevant peak intensities across all charge states? Or the ratio of the summed peak intensities of the most abundant charges? Or the ratio of the most abundant peak intensities?
Or maybe this can/should be relegated to FlashFLQ, and not be a part of deconvolution at all?
Or maybe they ionize differently, and the ratios of amounts have nothing to do with intensities?
Then, say have another condition with 2e6 of A and 1e7 of B. What's the formula to compute the 1:2 ratio of A in condition 1 vs condition 2, if the inputs to the formula are peak intensities?
I guess any of the three methods should give the correct fraction...
So Anthony, what do you mean by correct integrated intensity?
I think that deconvolution could be input to FlashLFQ. Sort of treat each mass ID in each spectrum as a PSM. FlashLFQ would do peakfinding and aggregate the intensities together. It would take some effort to get everything to communicate together but I think it would be a good division of labor. I have not looked at any NeuCode data though, not sure how that would interface with ProteoformSuite's current quantification system
Done
In imzspectrum interface add a deconvolution method that returns a list of masses. Parameters could specify the max charge possible (useful for ms2 where precursor charges are known), confidence level (for intact only want confidently identified masses, for ms2 ok with a lot of low confidence id's that might even correspond to a single isotope peak). Also a parameter could be the deconvolution result of a neighboring spectrum, which would increase confidence in matched masses (this is useful for intact, but useless for ms2)