Closed pavel-shliaha closed 7 years ago
Sry, for being unresponsive for such a long time.
I need some clarifications:
most important point: THINK ABOUT DRIFT IN INTENSITY IN THE EXACT SAME WAY YOU THINK ABOUT DRIFT IN RETENTION TIME
To answer you questions:
2) average peak cluster intensity is average intensity within a given period of time, e.g.: you can find that all peptides at timepoint 10.5-11min have an average ratio of 0.9 to the master. This can happen if there was smth wrong with the spray at this particular part of chromatogramme. This is conceptually identical to the drift in retention time and I think it is very easy to implement because you can completely reuse the retention time correction functionality, but plot intensity_master/intensity_quantitation_run against retention time of master.
1) as with retention time correction when creating master you need to adjust intensities before adding new ids to master
3) you should not correct intensities before detector saturation correction, but after, hence I suggest we simply create a new column and save this information there
Currently we don't store any peptide intensity information in the master object (just the precursor.inten
). The master is created from the final peptide files. The following intensity columns would be available: peptide.MatchedProductsSumInten, peptide.RelIntensity, precursor.inten, precursor.calcInten
Is peptide.MatchedProducsSumInten
the correct column?
Or do we need to import PEP3D files as well?
Will that working at all (especially for this dataset)? The Kuharev dataset contains 5 % E. coli and 30 % Yeast in sample A and 20 % E.coli and 15% Yeast in sample B. If we select sample A as master and B as slave (quant) run we expect an intensity ratio of 1/4 for E.coli and 2/1 in Yeast. In the worst case we have the same number of proteins indentified for both and lowess would pick an average of the both (1.25) and will destroy the whole intensity information. Could that be?
Here you can see an example plot using S130423_05 (master; A) and S130423_14 (slave; B) (using precursor.inten
as intensity, peptide.MatchedProductsSumInten
doesn't change this pattern), the red line is the lowess fit, the other lines are the theoretical ratios:
Did I understand everything wrong? Or do we have to ensure that just technical replicates (A and A; B and B) are normalized against each other?
Exactly; normalisation will only work if the majority of the proteins is at ratio 1:1, this is not true for samples B and A, hence they should not be normalised against each other.
so the master should also be created separately for A and B if the functionality is to be used.
Sorry I made a copy and paste error (the LOWESS line in the image above is the retention time vs. diff(retention time model). Here is the correct image:
As we discussed in our skype chat it could work for A and B samples if the majority of the intensity values have a ratio master/slave around 1. In this image you could see that 65 % human could be enough.
Nevertheless after I implemented this feature we should have a look at two different workflows:
Just for completeness the same model for sample A (S130423_05) vs sample A (S130423_07):
The same samples after applying the correction:
BTW: What happens around retention time = 145 min?
Thanks! Will play around with this over the weekend. Wanted to ask you to log2 ratios but you've done it already I presume?
Also around 145 min you see bad spray, probably.
Thanks! Will play around with this over the weekend.
It is not ready yet. Currently it just corrects the intensities during master creation. There is no correction for ident/quant. I drop you a note when you can start testing.
I am wondering where to do the ident/quant intensity correction. It should be possible to run it as additional step after the whole synpater workflow because we store the precursor.inten
, precursor.retT
columns from the identification run and add the Counts
column from the quantitation run. But is that reasonable? @pavel-shliaha Do you have any suggestions?
Wanted to ask you to log2 ratios but you've done it already I presume?
Already done.
There is a problem for the user: the current master creation is a black box. The user hasn't any chance to see a plot showing the influence of the span
/maxDeltaRt
argument (the above plots are handmade).
Do we need more plots?
Do we need a spanRet
and spanInten
argument to use different span
for the two loess
models?
I am wondering where to do the ident/quant intensity correction. It should be possible to run it as additional step after the whole synpater workflow because we store the precursor.inten, precursor.retT columns from the identification run and add the Counts column from the quantitation run. But is that reasonable? @pavel-shliaha Do you have any suggestions?
during master creation, intensity should be corrected automatically, BUT during master to quant identification transfer I would prefer to just create a column which a user can then use in MSnBase, since it allows correction after detector saturation correction.
I understand but this column could be created at any time, or? I just not sure when it is reasonable (e.g. create the column after findEMRTs
or don't create this column in synapter at all but on the final MSnSet (because we have all the necessary information in the fData
).
Currently I think the easiest would be to do this at MSnSet level (at least for testing and playing around).
I think this should be created when an explicit command is called, after merging the commonly identified peptides (just like modelling retention time, it will allow to keep track of the pipeline).
Now a basic version of intensity correction is implemented.
Just run synapter as usual. Before you transform the synapter object into a MSnSet
call modelIntensity
. This will add a new column intensityCorrectionFactor
to the MatchedEMRTs
data.frame
. After as(x, "MSnSet")
you can access this column via fData(msnset)$intensityCorrectionFactor
.
After combining multiple MSnSet
objects and applying saturation correction just run correctIntensity(msnset)
to multiply the intensity values by the correction factor, e.g.:
# synapter workflow
# ...
modelIntensity(synapterObj)
msnset <- as(synapterObj, "MSnSet")
## do stuff with msnset
correctedMSnSet <- correctIntensities(msnset)
I am somehow puzzled whether it is correct to multiply/divide by correction factor. According to the loess model and the input variables it should be multiplication (but in my test cases it seems to make the results worse and division looks more correct). I am currently too tired to follow this (I am on night shift this week). @pavel-shliaha if you see that division is correct I am very sorry but it is just a change of one sign (/
vs *
) in MSnbase-extension.R
:.correctIntensity
.
There is still a lot of stuff missing:
span
argument (could use the plotLoess
function but this has to be rewritten to be more generalized).span
argument, currently the span
is set for both the retention time and intensity model.If your tests are promising I will add the documentation and we can discuss the other points.
P.S.: Please note that you have to regenerate the master as well (there the intensity model is applied without additional user input (using the retention time span
)).
The current calculation for master is:
mergedPeptideData$intenRratio <- log2(mergedPeptideData$precursor.inten.master /
mergedPeptideData$precursor.inten.slave)
intModel <- loessModel(mergedPeptideData$precursor.retT.slave,
mergedPeptideData$intenRatio, span=span)
slaves[[i]]$IdentPeptideData$precursor.inten <-
slaves[[i]]$IdentPeptideData$precursor.inten *
2^(predict(intModel, slaves[[i]]$IdentPeptideData$precursor.retT))
for quant vs master (ident):
ratio <- log2(.self$MergedFeatures$precursor.inten.ident /
.self$MergedFeatures$precursor.inten.quant)
intModel <- loessModel(.self$MergedFeatures$precursor.retT.ident, ratio, span = span)
.self$MatchedEMRTs$intensityCorrectionFactor <-
2^(predict(intModel, .self$MatchedEMRTs$precursor.retT))
Counts * intensityCorrectionFactor
cool is there a function to visualise this (like the figures shown above?)
hey guys it works! I get lower CVs with normalisation when LOWESS correction is used than when I use normalisation post run. However synapter complains that correctIntensities does not exist instead I used:
exprs (combMSNSet) <- exprs (combMSNSet) *
(2^fData (combMSNSet)[, grep("intensityCorrectionFactor", colnames (fData (combMSNSet)))])
@sgibb could you please have a look at that?
The name of the function is: correctIntensity
(note the y
instead of the ies
at the end).
While I am happy to hear that it seems working I guess the effect is due to the master intensity correction.
Because your approach should produce a lot of wrong values. The intensityCorrectionFactor
column contains the intensity correction factor (meaning it is already 2^x
). The correct way of doing it:
i <- grep("intensityCorrectionFactor", fvarLabels(msnset))
exprs(msnset) <- exprs(msnset) * as.matrix(fData(msnset)[, i, drop = FALSE])
As stated above I am not completely sure about the *
operator (maybe we have to use /
). I am going to add plotting functions. These will show wrongly used *
operators.
So, there is a new function plotIntensity
(it is nearly identical to plotRt
):
I used a span=0.05
(default for rt alignment) but it seems too flexible.
(the raw/corrected (red/blue) plots are not available in synapter and created from the MergedFeatures data.frame
; and correctIntensity
correctly uses *
instead of /
)
getting the following error:
synapterAnalysis <- readRDS ("Y:/RAW/pvs22/_QTOF_DATA_data3/synapter2paper/kuharev2015/synapter2_intensity/output/UDMSE//S130423_05//synapterAnalysis.RDS")
modelIntensity(synapterAnalysis )
Error in model.frame.default(formula = y ~ x) : invalid type (NULL) for variable 'y'
can you please have a look?
14 days ago I decided to store calculate the intensity ratio intenRatio
during the merging process (in the same manner as deltaRt
is stored: https://github.com/lgatto/synapter/commit/6e128f1b381361f89bf0eb5daf5c30bd6e46cb1f). So it seems that your synapterAnalysis
was created before and intenRatio
is indeed missing. Please try to recreate synapterAnalysis
with the latest synapter.
Before we create the final manuscript we have to ensure that we are using the most recent synapter version for all scripts and figures!
The plotIntensity() works with the new synapter version after recreating synapter objects. But there are two problems:
1) when I run plotIntensity (synapterAnalysis, what = "model")
I get the following error: Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf
there is a plot with model above so I suppose it is an issue of this particular analysis. @sgibb I have shared the file with you through google drive
2) is it possible to increase points transperancy on the plots? Shown below is what I got and I think the plot is not very clear since all the points are on top of each other
I think 1. is a similar/the same problem as you described in: https://github.com/lgatto/synapter/issues/39#issuecomment-248502081 and solved in: https://github.com/lgatto/synapter/issues/39#issuecomment-248504276
Regarding the colour of the points: Currently we use an alpha (transparency) of 0.5 (1 means no transparency and 0 no colour). IMHO we should not decrease alpha (increase transparency) because it makes the colours very blurry. But the user can set the colour:
alpha <- c(1.0, 0.5, 0.1)
for (a in alpha) {
plotRt(s, what="data", col=rgb(red=1, green=0, blue=0, alpha=a))
title(paste("Alpha:", a))
}
Apparently a lot of label-free algorithms these days have a normalisation functionality for intensity throughout the run, since there is a systematic drift of intensities at different timepoints of acquisition: i.e. for example that at 10min of quant run all peptides are 0.9 or 1.2 in intensity than in identification run
to counter it I suggest ab approach similar to the one in isoquant (see below):
I think we should do the following:
1) during master construction, peptides being added to the master should be automatically intensity adjusted. I.e. when a peptide is added to master not only its retention time, but also its intensity is adjusted according to the master acquisition. 2) during identification transfer from master, a separate column is created with a value to multiply by to correct for intensity. Then AFTER saturation correction a user can multiply the exprs() values by the values in the correction column.
I think it is important to do that, since it will decrease CVs of quantitation and so increase synapter attractiveness to potential users.