deconvolution wishlist - Githubissues

smith-chem-wisc / mzLib

Library for mass spectrometry projects

GNU Lesser General Public License v3.0

26 stars 33 forks source link

deconvolution wishlist #264

Open leahvschaffer opened 7 years ago

leahvschaffer commented 7 years ago

Some output I would like to be able to access in PS:

specific charge state info (intensity, number of charges, monoisotopic m/z)
apex RT (at what retention time was proteoform most abundant)

Some options I would like for input parameters:

(What is intensity ratio setting?)
setting for min S/N ratio of peaks to consider for deconvolution
setting for minimum charge state to consider for deconvolution (set to 5 for deconvoluting proteins with Thermo Deconvolution)
setting for RT window size over which to aggregate features

stefanks commented 7 years ago

Regarding the first two, I believe the deconvolution output returns enough to extract this information.

Intensity ratio setting is a hard limit on how different relative intensities are allowed to be from averagine intensities. The more molecules we have, the closer the intensity levels should match the theoretical intensities, which are similar to averagine intensities.

Would you define S/N ? In mzml files, the "noise" array is not present, so it must be estimated in some way?

Min charge state - I will add this in.

RT window size: how is this useful? I feel that by restricting RT, we will only deteriorate our deconvolution results.

leahvschaffer commented 7 years ago

For the first two, not really. I can get the charges seen and calculate the m/z of each charge state. I don't know intensity of each charge state. In PS we weight the monoisotoipc mass calculation by intensity of charge state. I don't know if this similar calc is already being done.

Rob has found that the min peak in the y array is a good approximation

Is it deconvoluting a given mass across the entire RT range of the whole run? I need to know where each proteoform is most abundant in RT space (the apex RT as stated above) because then we only aggregate/make EE comparisons with experimentals nearby in RT...

leahvschaffer commented 7 years ago

What is a good starting point for intensity ratio?

stefanks commented 7 years ago

Ah, you're right, the peak objects are hidden in private fields. I will make those public to expose all the information.

Yes, it is deconvoluting across the entire RT range. Once you get all the peak info, you cold extract the most abundant time.

5 worked well for me for intensity ratio

stefanks commented 7 years ago

All issues fixed here: https://github.com/smith-chem-wisc/mzLib/commit/91b805c762f3dbffa356b6f07bbf086260fbf482

rmillikin commented 7 years ago

I'd like to add my 2c and say that deconvolution probably shouldn't assume the monoisotopic mass is below the limit of detection if the theoretical monoisotopic intensity is above noise.

This is a complicated way of saying: "Is the monoisotopic peak observed? and if not, should it be observed given the intensity of the isotope envelope and the noise level?"

stefanks commented 7 years ago

Sorry, I don't understand

rmillikin commented 7 years ago

let's discuss when you're in

stefanks commented 7 years ago

The way I understand this:

We would like to explore whether sometimes imputed peaks are above the expected noise level, thus should have been present in the spectrum in the first place. In this case the detected match may be wrong.

rmillikin commented 7 years ago

part of the score could be how gaussian the elution profile looks

stefanks commented 6 years ago

Another thought: the "intensity ratio" parameter, that decides how stringent we are with the requirement that intensities match averagine, should be computed automatically/dynamically based on things like TIC/mass/ratio of isotope envelope intensity to TIC/something else.

Instead of just being hard-coded.