lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

plotFragmentMatching (): framgents from master database too low #101

Closed pavel-shliaha closed 8 years ago

pavel-shliaha commented 8 years ago

When plotting fragment matching data, the relative intensities of fragments in quant run is reasonable,while the intensities of fragments in fragment library are almost 0. See print screen 20 8 15 bug low fragment intensity in library

I presume it is an error while making fragment library. I have put the files necessary to make master into

data_for_synapter_2.0/bug/fragment_library_intensity

lgatto commented 8 years ago

Try adding plotFragmentMatching(..., norm=TRUE).

pavel-shliaha commented 8 years ago

nope.

plotFragmentMatching (synapterAnalysis, column = "peptide.seq", key = "SASDIATAELAPTHPIR", norm=TRUE)

|====================================================================================================================================| 100% Warning messages: 1: In plot.window(...) : "norm" is not a graphical parameter 2: In plot.xy(xy, type, ...) : "norm" is not a graphical parameter 3: In axis(side = side, at = at, labels = labels, ...) : "norm" is not a graphical parameter 4: In axis(side = side, at = at, labels = labels, ...) : "norm" is not a graphical parameter 5: In box(...) : "norm" is not a graphical parameter 6: In title(...) : "norm" is not a graphical parameter

lgatto commented 8 years ago

@sgibb the inner function .plotSpectrumVsSpectrum has a norm argument. Apparently, the way the ... are passed from plotFragmentMatching to .plotSpectrumVsSpectrum (and further) fail to apply norm where needed.

Now I see that the default is norm = TRUE, which is however not applied.

sgibb commented 8 years ago

@lgatto:

@sgibb the inner function .plotSpectrumVsSpectrum has a norm argument.

No, it has not. I don't remember the reason but synapter:::.plotSpectrumVsSpectrum is not the same as MSnbase:::.plotSpectrumVsSpectrum (which has a norm argument). I first developed these functions for synapter and we moved them to MSnbase later. Currently I have no idea why we kept the modified synapter version as well (which is in fact indentically to MSnbase:::.plotSpectrumVsSpectrum except the little addition of the grid search result to the label and the missing norm argument). I will fix this mess (maybe I have to touch MSnbase:::.plotSpectrumVsSpectrum e.g. add a label argument or something similar)

@pavel-shliaha

while the intensities of fragments in fragment library are almost 0

Currently all spectra are normalized to their precursor in synapter:::.plotFragmentsVsSpectrum:

spectra <- lapply(spectra, normalize, method="precursor")

I need some time to investigate the reason why its not working for fragment libraries.

lgatto commented 8 years ago

I hadn't reaslised that there wereMSnbase:::.plotSpectrumVsSpectrum and synapter:::.plotSpectrumVsSpectrum - confusing indeed.

(maybe I have to touch MSnbase:::.plotSpectrumVsSpectrum e.g. add a label argument or something similar)

Sure, go ahead

sgibb commented 8 years ago

@pavel-shliaha: while the intensities of fragments in fragment library are almost 0

In fact almost all intensities are exact zero! But it has nothing to do with the plotting or the normalisation before the plotting. The reason is in the data itself and the way we create the fragment library in https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L53-L79.

## sum fragment intensity for equal fragments of each peptide
product.inten <- ave(fragments$product.inten,
                     fragments$peptide.seq, fragments$fragment.str,
                     FUN=sum)

## sum precursor intensity for equal fragments of each peptide
precursor.inten <- ave(fragments$precursor.inten,
                       fragments$peptide.seq, fragments$fragment.str,
                       FUN=sum)
summary(product.inten)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#      30      92     232    1819     768  280400
summary(precursor.inten)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#     750    6460   19220   82750   62700 1715000

## calculate new intensity and round to nearest integer
product.inten <- round(product.inten / precursor.inten)

summary(product.inten)
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
# 0.000000 0.000000 0.000000 0.007069 0.000000 1.000000

Maybe it is enough to just remove the round statement at this line and readd it in https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L75 (or remove it completely)? @pavel-shliaha could you review the calculation of the new intensities once more? Currently we doing the following (step 1-3 according to #63; all the other steps are according to email/skype correspondence):

  1. Sum fragment intensities (product.inten) for equal fragments in each peptide https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L55-L57.
  2. Do the same for all precursor intensities (precursor.inten) https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L59-L61.
  3. Divide each summed product.inten by the corresponding (summed) precursor.inten, resulting in a normalised product.inten for each fragment in each peptide.
  4. Round this to the nearest integer (number without decimal places; :exclamation: this is the reason for the error above).
  5. Remove duplicated fragments (fragments that occur in more than one run; their intensities are already summed up) https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L63.
  6. Calculate a new intensity for the precursors by taking the mean precursor.inten of all fragments in a peptide https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L70-L73
  7. Multiply the intensities of the fragments by their new precursor intensity (IHMO that would be the right place to round to the nearest integer): https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L75
  8. Turn product.rank in a continuous sequence again https://github.com/lgatto/synapter/blob/2.0/R/fragmentlibrary-functions.R#L76-78.

IMHO I have misread some of our discussion and used round in step 4 instead of step 7. But before I will fix this I want you to verify that the whole algorithm is correct.

pavel-shliaha commented 8 years ago

For every fragment please divide the sum of its intensities by the sum of precursor intensities in which the fragment has been observed.

If a fragment is seen in run 1 and 2,but a peptide is seen in run 1, 2 and 3 the formula for fragment intensity is:

(frag_int_1 + frag_int_2)/ (prec_int_1 + prec_int_2)

not precursor intensity in run 3 is omitted

sgibb commented 8 years ago

In general that is the way we do it already.

To summarize the following is done:

  1. Sum intensities over all equal fragments and equal peptides over all runs:

    (frag1_pep1_run1_int + frag1_pep1_run2_int + ... + frag1_pep1_runN_int)

  2. Divide per the corresponding precursor intensity

    (frag1_pep1_run1_int + frag1_pep1_run2_int + ... + frag1_pep1_runN_int) / (pep1_run1_precInt + pep1_run2_precInt + ... + pep1_runN_precInt) = new_frag1_int

    (runs with where the frag1 is missing are ignored (but only for this fragment particular fragment)

  3. Use the mean precursor int of all runs of a specific peptide as new precursor int

    (pep1_run1_precInt + pep1_run2_precInt + ... + pep1_runN_precInt)/N = new_prec_int

  4. Calculate new fragInt and round to nearest integer

    new_frag1_int = new_frag1_int (step2) * new_prec_int (step3) new_frag1_int = round(new_frag1_int) // please note this is not just the opposite of step2 because step2 has only the runs where the fragment is present and step3 combines all precursors from the same peptide across all runs

See https://github.com/lgatto/synapter/blob/2.0/tests/testthat/test_fragmentlibrary-functions.R for an easy example with numbers.

pavel-shliaha commented 8 years ago

Checked it. Its right. Thanks Sebastian!