lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

Correct saturation by only quantifying on unsaturated isotopes in all samples under comparison #39

Closed pavel-shliaha closed 7 years ago

pavel-shliaha commented 11 years ago

The prototype of suggested function:

requantify <- function (synapterObj , satThreshold, minIsotopes)

synapterObj - list of synapter objects satThreshold - intensity over which accurate ion recording is not possible, due to saturation minIsotopes - minimum number of isotopes accepted for requantitation

For every peptide the function will find common unsaturated ions from supplied synapter objects, and requqntify EMRT using these ions only. satThreshold is used to determine which ions saturate. minIsotopes is the minimum number of isotopes that are sufficient to requantify (i.e. sometimes only few isotopes will be seen in all samples under saturation and over LOD)

Important: the information on peptide isotopes is not presented in current synapter objects! Pep3D file that is loaded in synapter is filtered so that for each EMRT only one isotope is preserved. This means that synapter object will have to be modified to include this information. Current consensus is that a list should be created each element of which represents a single EMRT and contains information on ion intensities from those EMRTs

pavel-shliaha commented 8 years ago

Having problem applying the rescaleForTop3() function. Below is a figure of three datasets: 1) not requantified 2) requantified using SUM method 3) requantified using SUM method with the intensities repredicted by rescaleForTop3()

image

I had a look at a peptide I thin is requantified incorrectly:

image

as you can see the sum functionality has decreased the ratio from 0.2883538 to 0.207168 as expected but top3 requantitation has put it back up to 0.3473387. This probably should not happen. The code is in: ...\synapter2paper\kuharev2015\bugs_investigation\for_bug_investigation_requant_top3

sgibb commented 8 years ago

It seems that it is not a bug but a feature. rescaleForTop3 has an argument onlyForSaturatedRuns (default TRUE) that controls whether unsaturated runs should be rescaled or not. I modified your plotting function and marked all unsaturated peptides with black dots and you can see that they cause the wired pattern after rescaling (because they are not requantified or rescaled at all): rescaletop3 (the right bottom panel was generated with rescaleForTop3(..., saturationThreshold=3e4, onlyForSaturatedRuns=FALSE))

You could find the modified code in the same directory.

sgibb commented 8 years ago

Another figure for the same fact (please note that the colours have different meanings now: blue: no saturation; red: saturation in one sample (A or B; that is the important fraction that cause the "strange" behaviour in the bottom left panel); green: saturation in both samples: rescaletop3

sgibb commented 8 years ago

basic information

library (synapter)
library (MSnbase)

combMSNset <- readRDS ("refCombMSNSet.RDS")
satThreshold <- 3e4

## samples
a <- 1:5
b <- 6:10

## fetch peptide of interest
poi <- combMSNset[grep("^AGMVAGVIVNR$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]
isom <- synapter:::.isotopicDistr2matrix(iso)
isom
##                            1_0  1_1  1_2 1_3   2_0   2_1   2_2  2_3 2_4
## isotopicDistr.S130423_05  1378  608  273  NA 25649 15016  5707 1150  NA
## isotopicDistr.S130423_07  1246  728  255  NA 25150 11517  4869 1230  NA
## isotopicDistr.S130423_09   857  357  180  NA 19434  9894  3891  589  NA
## isotopicDistr.S130423_11   930  365   NA  NA 19231 10580  4375   NA  NA
## isotopicDistr.S130423_13   759  400  147  NA 14761  8035  3321  813  NA
## isotopicDistr.S130423_06 10708 4943 1390 341 88884 63955 27550 6491  NA
## isotopicDistr.S130423_08  7132 3480  760 184 61927 45300 21154 5227 938
## isotopicDistr.S130423_10  5054 2233  644 139 55808 40075 17012 4064 678
## isotopicDistr.S130423_12  4741 2000  746 155 50171 35098 14078 3730 670
## isotopicDistr.S130423_14  3451 1723  500 154 39062 25263 11036 3078  NA
## all samples "a" are not saturated but all of "b"
synapter:::.runsUnsaturated(t(isom), saturationThreshold=satThreshold)
## isotopicDistr.S130423_05 isotopicDistr.S130423_07 isotopicDistr.S130423_09
##                     TRUE                     TRUE                     TRUE
## isotopicDistr.S130423_11 isotopicDistr.S130423_13 isotopicDistr.S130423_06
##                     TRUE                     TRUE                    FALSE
## isotopicDistr.S130423_08 isotopicDistr.S130423_10 isotopicDistr.S130423_12
##                    FALSE                    FALSE                    FALSE
## isotopicDistr.S130423_14
##                    FALSE
## run requantification and rescaling
req <- requantify(poi, method="sum", saturationThreshold=satThreshold,
                  onlyCommonIsotopes=FALSE)
top3 <- rescaleForTop3(poi, req, satThreshold, onlyForSaturatedRuns=TRUE)
top3all <- rescaleForTop3(poi, req, satThreshold, onlyForSaturatedRuns=FALSE)

intensity table

tab <- rbind(exprs(poi), exprs(req), exprs(top3), exprs(top3all))
rownames(tab) <- c("initial", "sum", "top3", "top3all")
knitr::kable(tab)
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
initial 49781.00 44995.00 35202.00 35481.00 28236.00 204262.0 146102.00 125707.00 111389.00 84267.00
sum 9116.00 8328.00 5874.00 5670.00 5440.00 51423.0 38875.00 29824.00 26120.00 19942.00
top3 49781.00 44995.00 35202.00 35481.00 28236.00 130797.1 98880.57 75858.89 66437.57 50723.51
top3all 44431.54 40590.82 28629.98 27635.68 26514.66 250636.6 189477.44 145362.70 127309.34 97197.66

plot intensities

plot(NA, xlim=c(14, 18), ylim=c(-3, 3),
     xlab="intensity sample B", ylab="A/B")
abline(h=c(1, 0, -2), col="#808080")
col <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
points(log2(tab[, b]), log2(tab[, a]/tab[, b]), col=col, pch=20)
legend("topright", legend=rownames(tab), col=col, pch=20)

unnamed-chunk-3-1

intensity ratios

ratios <- rbind(exprs(poi)[, a]/exprs(poi)[, b],
                exprs(req)[, a]/exprs(req)[, b],
                exprs(top3)[, a]/exprs(top3)[, b],
                exprs(top3all)[, a]/exprs(top3all)[, b])
rownames(ratios) <- c("initial", "sum", "top3", "top3all")
knitr::kable(ratios)
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
initial 0.2437115 0.3079698 0.2800321 0.3185324 0.3350778
sum 0.1772748 0.2142251 0.1969555 0.2170750 0.2727911
top3 0.3805972 0.4550439 0.4640458 0.5340502 0.5566650
top3all 0.1772748 0.2142251 0.1969555 0.2170750 0.2727911

detailed walkthrough for onlyForSaturatedRuns=TRUE

## exprs before requantification
eBefore <- exprs(poi)
eBefore
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR     204262     146102     125707     111389      84267
## exprs after requantification
eAfter <- exprs(req)
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR       9116       8328       5874       5670       5440
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942
## grep isotopic information
isotop <- as.matrix(iso)
isotop
##             isotopicDistr.S130423_05
## AGMVAGVIVNR "1_0:1378;1_1:608;1_2:273;2_0:25649;2_1:15016;2_2:5707;2_3:1150"
##             isotopicDistr.S130423_07
## AGMVAGVIVNR "1_0:1246;1_1:728;1_2:255;2_0:25150;2_1:11517;2_2:4869;2_3:1230"
##             isotopicDistr.S130423_09
## AGMVAGVIVNR "1_0:857;1_1:357;1_2:180;2_0:19434;2_1:9894;2_2:3891;2_3:589"
##             isotopicDistr.S130423_11
## AGMVAGVIVNR "1_0:930;1_1:365;2_0:19231;2_1:10580;2_2:4375"
##             isotopicDistr.S130423_13
## AGMVAGVIVNR "1_0:759;1_1:400;1_2:147;2_0:14761;2_1:8035;2_2:3321;2_3:813"
##             isotopicDistr.S130423_06
## AGMVAGVIVNR "1_0:10708;1_1:4943;1_2:1390;1_3:341;2_0:88884;2_1:63955;2_2:27550;2_3:6491"
##             isotopicDistr.S130423_08
## AGMVAGVIVNR "1_0:7132;1_1:3480;1_2:760;1_3:184;2_0:61927;2_1:45300;2_2:21154;2_3:5227;2_4:938"
##             isotopicDistr.S130423_10
## AGMVAGVIVNR "1_0:5054;1_1:2233;1_2:644;1_3:139;2_0:55808;2_1:40075;2_2:17012;2_3:4064;2_4:678"
##             isotopicDistr.S130423_12
## AGMVAGVIVNR "1_0:4741;1_1:2000;1_2:746;1_3:155;2_0:50171;2_1:35098;2_2:14078;2_3:3730;2_4:670"
##             isotopicDistr.S130423_14
## AGMVAGVIVNR "1_0:3451;1_1:1723;1_2:500;1_3:154;2_0:39062;2_1:25263;2_2:11036;2_3:3078"
## if we want to handle only saturated runs we have to know which ones are
## unsaturated (this code block is skipped for onlyForSaturatedRuns=FALSE
unsat <- t(apply(isotop, 1, function(x)synapter:::.runsUnsaturated(t(synapter:::.isotopicDistr2matrix(x)), saturationThreshold=satThreshold)))
unsat
##             isotopicDistr.S130423_05 isotopicDistr.S130423_07
## AGMVAGVIVNR                     TRUE                     TRUE
##             isotopicDistr.S130423_09 isotopicDistr.S130423_11
## AGMVAGVIVNR                     TRUE                     TRUE
##             isotopicDistr.S130423_13 isotopicDistr.S130423_06
## AGMVAGVIVNR                     TRUE                    FALSE
##             isotopicDistr.S130423_08 isotopicDistr.S130423_10
## AGMVAGVIVNR                    FALSE                    FALSE
##             isotopicDistr.S130423_12 isotopicDistr.S130423_14
## AGMVAGVIVNR                    FALSE                    FALSE
## replace requantified values of unsaturated runs with their original
## intensities
eAfter[unsat] <- eBefore[unsat]
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942

:warning: Maybe it is wrong to include the unsaturated runs for proportion calculation here?!

## calculation proportions
prop <- eAfter/rowSums(eAfter, na.rm=TRUE)
prop
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   0.138327  0.1250281 0.09781621 0.09859147  0.0784597
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR  0.1428897  0.1080224  0.0828723 0.07257995 0.05541307
## calculate correction factor
cf <- eBefore/prop
cf
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR     359879     359879     359879     359879     359879
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR    1429508    1352516    1516876    1534708    1520706
cfm <- rowMeans(cf, na.rm=TRUE)
cfm
## AGMVAGVIVNR
##    915370.9
## calculate new intensities
eNew <- cfm * prop
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   126620.5   114447.1   89538.11   90247.76   71819.73
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   130797.1   98880.57   75858.89   66437.57   50723.51
## replace unsaturated runs with original values
## (this code block is skipped for onlyForSaturatedRuns=FALSE
eNew[unsat] <- eBefore[unsat]
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   130797.1   98880.57   75858.89   66437.57   50723.51

detailed walkthrough for onlyForSaturatedRuns=FALSE

## exprs before requantification
eBefore <- exprs(poi)
eBefore
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR     204262     146102     125707     111389      84267
## exprs after requantification
eAfter <- exprs(req)
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR       9116       8328       5874       5670       5440
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942
## calculation proportions
prop <- eAfter/rowSums(eAfter, na.rm=TRUE)
prop
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR 0.04544095 0.04151297  0.0292804 0.02826351 0.02711702
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR  0.2563306   0.193782  0.1486651  0.1302016 0.09940582
## calculate correction factor
cf <- eBefore/prop
cf
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR    1095510    1083878    1202238    1255364    1041265
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   796869.3   753950.2   845571.8   855511.9   847706.9
cfm <- rowMeans(cf, na.rm=TRUE)
cfm
## AGMVAGVIVNR
##    977786.4
## calculate new intensities
eNew <- cfm * prop
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   44431.54   40590.82   28629.98   27635.68   26514.66
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   250636.6   189477.4   145362.7   127309.3   97197.66
pavel-shliaha commented 8 years ago

Hey Sebastian I think u pinpointed the problem yourself here. It is indeed incorrect to include saturated runs into calculation of correction factor.

I believe you got confused: onlyForSaturTedRuns refers to whether u have to requantify all peptides or only those above saturation threshold, the correction factor should always be computed on unsaturated peptides only. This is analogous to isotopic correction we perform in theoretical methods.

pavel-shliaha commented 8 years ago

Hey Sebastian,

please execute this code and tell me what you see

library(synapter)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS")
satThresholdIon <- 3e4

satCorrected <- sapply (c ("sat", "sum", "reference"),
                        function (x) NULL)

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "sum", 
                                 saturationThreshold= satThresholdIon,
                                 onlyCommonIsotopes=FALSE)

satCorrected[[3]]    <- requantify (combMSNSet, method = "reference", 
                                    saturationThreshold= satThresholdIon)

# function to find requantified row
findDifferentquant <- function (MSnSet1, MSnSet2){
  selA <- c () 
  for (i in 1:nrow (MSnSet1))  if (any (exprs(MSnSet1)[i, ] != exprs(MSnSet2)[i, ], na.rm = TRUE)) selA <- c (selA, i)
  return (selA)
}

diffSum <-  findDifferentquant(satCorrected[[1]], satCorrected[[2]])
diffRef <-  findDifferentquant(satCorrected[[1]], satCorrected[[3]])

length (diffSum)
length (diffRef)

I see

length (diffSum) [1] 3069

and

length (diffSum) [1] 3069

length (diffRef) [1] 3067

sgibb commented 8 years ago

@pavel-shliaha I got the same output. Now I finally understand what you mean with "FFQELR" and "ESNEITIIINPYRETVCFSVEPVK" are requantified by "sum" but not by "reference". You mean that the intensities with and without reference requantification are identical.

I looked into "FFQELR". Just as reminder the isotopic distribution:

## fetch peptide of interest
poi <- combMSNset[grep("^FFQELR$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

m <- synapter:::.isotopicDistr2matrix(iso)
m
#                            1_0   1_1  1_2 1_3 1_4  2_0  2_1 2_2 2_3
# isotopicDistr.S130423_05   632   307   NA  NA  NA 6099 2445 696  NA
# isotopicDistr.S130423_07    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_09 37909 16706 2711 501 174 8502 3463 677 125
# isotopicDistr.S130423_11    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_13    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_06    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_08    NA    NA   NA  NA  NA 5538 1675  NA  NA
# isotopicDistr.S130423_10    NA    NA   NA  NA  NA 4842 1439 484  NA
# isotopicDistr.S130423_12    NA    NA   NA  NA  NA 4128 1625 371  NA
# isotopicDistr.S130423_14    NA    NA   NA  NA  NA 3386 1290 349  NA

It is a special case: all but one run (S130423_09) are unsaturated. Now the algorithm looks for the best reference run (which is the one with the most unsaturated isotopes: S130423_09 (8 unsaturated and 1 saturated)). Subsequently the new intensity values are calculated for run S130423_09 (all other runs are unsaturated and not touched at all). The correction factor between the unsaturated isotopes of S130423_09 and the reference (S130423_09, too) is 1. That's why the saturated value is replaced by the identical value (the requantification works: saturated value * reference correction factor; and reference correction factor = mean ( unsaturated isotopes / unsaturated reference isotopes). So in this special case it is expected that the requantificated intensties are identical to the original onces. requantify (poi, method = "sum", saturationThreshold= 3e4, onlyCommonIsotopes=FALSE) just removes isotope S130423_09:1_0 (that's why it differs slightly).

sgibb commented 8 years ago

For the ESNEITIIINPYRETVCFSVEPVK the only saturated run is also the reference run (same situation as for FFQELR above).

I hope the onlyForSaturatedRuns problem is fixed now: rescaletop3

pavel-shliaha commented 8 years ago

there is a problem with theoretical method correction for some peptides, e.g. ILFDYSK. I have run the code and the correction is in the table presented below

library (synapter) library (MSnbase)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS")

satThresholdIon <- 3e4 satCorrected <- sapply (c ("sat", "th.mean", "th.median", "th.weighted.mean"), function (x) NULL)

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "th.mean", saturationThreshold= satThresholdIon, requantifyAll=FALSE)

satCorrected[[3]] <- requantify (combMSNSet, method = "th.median", saturationThreshold= satThresholdIon, requantifyAll=FALSE)

satCorrected[[4]] <- requantify (combMSNSet, method = "th.weighted.mean", saturationThreshold= satThresholdIon, requantifyAll=FALSE)

xx <- rbind (exprs (satCorrected[[1]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ], exprs (satCorrected[[2]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ], exprs (satCorrected[[3]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ], exprs (satCorrected[[4]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ])

row.names(xx) <- names (satCorrected)

image

could you please produce for "ILFDYSK" a detailed procedure of what is happening here, like you did for previous peptides when we had a problem?

pavel-shliaha commented 8 years ago

sorry yet another issue

sum method

library (synapter) library (MSnbase)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS") satThresholdIon <- 3e4 satCorrected <- sapply (c ("sat", "sum")

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "sum", saturationThreshold= satThresholdIon, onlyCommonIsotopes=TRUE)

exprs (satCorrected[[2]])[5, ]

all are NA! so requantification FAILED!

poi <- combMSNSet[5,] iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNSet))] m <- synapter:::.isotopicDistr2matrix(iso) m <- m[apply (m, 1, function (x) any (!is.na (x))), ] m <- m[, apply (m, 2, function (x) all (!is.na (x)))] m <- m[, apply (m, 2, function (x) all (x < saturationThreshold))]

m

image

but this shows requantification is possible

sgibb commented 8 years ago

*The `th.` problem:**

combMSNset <- readRDS("refCombMSNSet.RDS")

## fetch peptide of interest
poi <- combMSNset[grep("^ILFDYSK$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

x <- synapter:::.isotopicDistr2matrix(iso)
saturationThreshold <- 3e4
unsat <- .isUnsaturatedIsotope(x, saturationThreshold=saturationThreshold)
#                          1_0  1_1  1_2   2_0   2_1
#isotopicDistr.S130423_05   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_07   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_09   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_11   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_13   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_06 TRUE TRUE TRUE FALSE FALSE
#isotopicDistr.S130423_08 TRUE TRUE   NA FALSE FALSE
#isotopicDistr.S130423_10 TRUE TRUE   NA FALSE  TRUE
#isotopicDistr.S130423_12   NA   NA   NA FALSE  TRUE
#isotopicDistr.S130423_14   NA   NA   NA FALSE  TRUE

Above we see the first problem: There are not any isotopes below the saturation threshold for the first 5 runs. So we can't predict anything here (explains the NA/0.0 in the first five columns of your table).

x
#                           1_0  1_1 1_2   2_0   2_1
#isotopicDistr.S130423_05    NA   NA  NA 70214 40605
#isotopicDistr.S130423_07    NA   NA  NA 64751 41339
#isotopicDistr.S130423_09    NA   NA  NA 66143 34858
#isotopicDistr.S130423_11    NA   NA  NA 54542 30211
#isotopicDistr.S130423_13    NA   NA  NA 47453 30213
#isotopicDistr.S130423_06 12833 5303 955 63133 33950
#isotopicDistr.S130423_08  9116 4281  NA 53849 31806
#isotopicDistr.S130423_10  9077 3418  NA 50046 24914
#isotopicDistr.S130423_12    NA   NA  NA 40965 20147
#isotopicDistr.S130423_14    NA   NA  NA 31636 19936

x <- x * unsat
#                           1_0  1_1 1_2 2_0   2_1
#isotopicDistr.S130423_05    NA   NA  NA   0     0
#isotopicDistr.S130423_07    NA   NA  NA   0     0
#isotopicDistr.S130423_09    NA   NA  NA   0     0
#isotopicDistr.S130423_11    NA   NA  NA   0     0
#isotopicDistr.S130423_13    NA   NA  NA   0     0
#isotopicDistr.S130423_06 12833 5303 955   0     0
#isotopicDistr.S130423_08  9116 4281  NA   0     0
#isotopicDistr.S130423_10  9077 3418  NA   0 24914
#isotopicDistr.S130423_12    NA   NA  NA   0 20147
#isotopicDistr.S130423_14    NA   NA  NA   0 19936

And here we see the second problem. Allmost all two-charged ions are saturated (and not used for the prediction) that's why our predicted intensities for S130423_06 and S130423_08 are very low. The runs with unsaturated two-charged ions S130423_10 and S130423_14 yield higher intensities.

The "sum" problem:

The isotopic matrix of the fifth peptide (LAQANGWGVMVSHR) is:

                           2_0   2_1  2_2  2_3  2_4   3_0   3_1   3_2   3_3   3_4
isotopicDistr.S130423_05 24557 19414 8161 3387 1424 66080 59557 33660 21962  9124
isotopicDistr.S130423_07 26498 21050 9577 3939 1922 68076 58367 42972 24453 13588
isotopicDistr.S130423_09    NA    NA   NA   NA   NA    NA    NA    NA    NA    NA
isotopicDistr.S130423_11 16756 16158 5719 2899 1341 56830 53333 30881 16859  7490
isotopicDistr.S130423_13 17117 12276 5233 2072   NA 48460 41008 30369 13261  8023
isotopicDistr.S130423_06 24851 17418 9016   NA   NA 65222 56007 37009    NA    NA
isotopicDistr.S130423_08 19034 13436 6568   NA   NA 52526 44629 32459    NA    NA
isotopicDistr.S130423_10 14319 11793 5010   NA   NA 55761 47528 25765    NA    NA
isotopicDistr.S130423_12 15707 13055 5352   NA   NA 49521 45595 26457    NA    NA
isotopicDistr.S130423_14 17540 12890 6435   NA   NA 44465 43195 29950    NA    NA

As you see the third run isotopicDistr.S130423_09 is completely missing. In the current definition of onlyCommonIsotopes=TRUE that means there is not any isotope present in all runs (because S130423_09 has no isotopes at all).

pavel-shliaha commented 8 years ago

for sum method can you please no consider runs, for which we dont have peptide identity i.e. all the isotopes are NA. Simply ignore the line or convert to 0. This is the final fix I need to finish the paper. For the theroretical methods we need to think what to do in instances like the one above where requantification is not possible.

lgatto commented 8 years ago

Simply ignore the line or convert to 0

Converting to 0 is not advisable. If a line is ignored, this would need to be recorded somewhere, or reported to the used at the very least. The best is to keep all features, but set those that don't return any value to NA. It is then just a matter of calling filterNA to remove then afterwards.

pavel-shliaha commented 8 years ago

sorry guys did not mean to tell you how to code

lgatto commented 8 years ago

sorry guys did not mean to tell you how to code

No worries - I just wanted to make sure we stay away from wild 0-imputation.

sgibb commented 8 years ago

Ok, now missing runs (runs without any recorded intensity value) are ignored for requantify(..., method="sum", onlyCommonIsotopes=TRUE):

combMSNset <- readRDS("refCombMSNSet.RDS")

## fetch peptide of interest
poi <- combMSNset[5,]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

# NONE
exprs(poi)
# FALSE
exprs(requantify(poi, method="sum", saturationThreshold=3e4, onlyCommon=FALSE))
# TRUE
exprs(requantify(poi, method="sum", saturationThreshold=3e4, onlyCommon=TRUE))
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
NONE 247326 270442 NA 208266 177819 209523 168652 160176 155687 154475
FALSE 88029 101027 NA 67222 57982 51285 39038 31122 34114 36865
TRUE 52132 57125 NA 38633 34626 51285 39038 31122 34114 36865
pavel-shliaha commented 8 years ago

Hey Sebastian. Problems again. Please have a look.

satThresholdIon <- 3e4
combMSNSet2  <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//combMSNSet.RDS")
combMSNSet2 <- requantify (combMSNSet2, method = "sum", 
                           saturationThreshold= satThresholdIon, 
                           onlyCommonIsotopes=FALSE)

getting the following message:

Error in seq.default(from = 1L, to = nall, by = 2L) : wrong sign in 'by' argument

already tried restarting R session and reinstalling synapter. Can you reproduce that?

sgibb commented 8 years ago

I never see this kind of error before. It seems that the peptide YATALAK(row 3412) has just NA values (except precursor.mhp.S130423_05: 737.4174). I don't know why this happen.

Nevertheless it was a bug, that the functions could not handle entries without any non-NA value. That is fixes now.

@pavel-shliaha: Why we didn't recognize this before. Where does YATALAK come from?

pavel-shliaha commented 8 years ago

lets keep this issue open for now

pavel-shliaha commented 7 years ago
requantify (readRDS("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2_intensity//output//UDMSE//refcombMSNSetNS.RDS"), 
            method = "th.mean", 
            saturationThreshold= 3e4)

returns

Error in Mod(z$values) : non-numeric argument to function

could you please have a look

sgibb commented 7 years ago

I have currently no access to prot-filesrv1 (networkmanager-strongswan plugin seems to not accept/send the password). Can you send me the refcombMSNSetNS.RDS via e-mail/dropbox/google drive?

pavel-shliaha commented 7 years ago

shared the file with you via google drive

sgibb commented 7 years ago

Sorry, but I can't reproduce the error. Just works for me. Could you try again and directly call traceback() after the error. Also the output of sessionInfo() could be helpful.

pavel-shliaha commented 7 years ago

@sgibb started having trouble with the theoretical method as soon as I updated synapter. I have shared the file with you through google drive.

testMSNSet <- readRDS ("MSnbaseProblemRequant.RDS")

satThresholdIon <- 3e4

requantify (testMSNSet, method = "th.mean", saturationThreshold= satThresholdIon)

error:

Error in Mod(z$values) : non-numeric argument to function

pavel-shliaha commented 7 years ago

Sorry guys now it works after restarting R session (but I also restarted before posting). I am not sure but perhaps this intermittent error has smth to do with the BRAIN package.