Open lgatto opened 6 years ago
Re QuantitationParams
class. In xcms
I am using parameter classes to switch between analysis methods (such as CentWaveParam
, ObiwarpParam
etc). I think that works nicely, especially because I can put these parameter classes into the process history - so it is possible to exactly reproduce the analysis based on the result object alone. What was an overkill is that I implemented getter and setter methods for all slots of each parameter class - would never ever do that again.
Re OnDiskMSnExp
as a result object. As you know I am using that in xcms
. I find that useful, because it allows, although you have the processed/quantified feature data, to access the original raw data at any point - and the OnDiskMSnExp
is only very lightweight anyway. So I like the idea.
I think it would be good to keep MSnbase
and xcms
similar, hence maybe having QuantitationParams
for both is good - especially that MS1 quantitation will be using xcms
in the future.
I will into your XCMSnExp
again and see to what extend it can be used or adapted.
@jotsetung - a few questions following up from the discussion above.
In xcms
, features are extracted from an OnDiskMSnExp
to create an XCMSnExp
that contains the additional MsFeatureData
environment. What it the method/function that creates and XCMSnExp
from an OnDiskMSnExp
? It would be nice to keep some consistency.
What would you think of me reusing the XCMSnExp
for my quantitation purpose? If so, would you consider renaming it something like FeatureMSnExp
or QuantMSnExp
. Or I could create that class that inherits from XCMSnExp
without adding anything and go from there. I know there will be some cyclic dependencies issues, but it's good to think about sharing infrastructure early on.
Re parameters above, I saw that you have the Params
/GenericParam
classes, and a whole range of specialisations thereof for CentWave
, MatchedFilter
, ... My idea was to have a QuantitationParam
and then specialisations for LFQ, TMT, iTRAQ, spectral counting, ... Do you think it would be worth having some link, such as mine inheriting from GenericParam
? This would make label-free quantitation using xcms
more coherent - I envision something like
qx <- quantify(x, method = "CentWaveParam")
Problem is that the quantification in xcms is a multi-step approach:
1) chromatographic peak detection (within each sample): findChromPeak
.
2) alignment (retention time adjustment between samples): adjustRtime
3) correspondence (grouping of chromatographic peaks across samples): groupChromPeaks
.
Already step 1) returns an XCMSnExp
object that contains the chromatographic peak table in the MsFeatureData
. (optional) Step 2) does add adjusted retention times to the MsFeatureData
and step 3) groups peaks across samples and does the quantification (adds the feature definitions, which is in essence just a DataFrame
with indices linking to the individual chromatographic peaks belonging to the same feature).
The quantification matrix can then be extracted using the featureValues
method that builds the matrix on-the-fly by extracting the corresponding quantifications from the chromatographic peak table.
The idea of having a single quantify
method to do all of this might be tricky. For applying quantification using the yams
package from the Hansen group it would however work (might be something I would like to implement anyway) as they're not doing any alignment etc.
Now re quantify(x, param = CentWaveParam())
, this would do an identification of chromatographic peaks within each sample - but there is no way yet to generate a matrix
-like quantification matrix as the peaks have still to be matched across samples (by groupChromPeaks
). Problem is really that xcms
is designed as a multi-step approach.
To me it would make sense to apply a quantification method once the features have been defined (i.e. the 2d areas on the rt-m/z space). This can be done with a combination of findChromPeaks
and groupChromPeaks
. An alternative possibility would be to define yet another parameter class that holds several Param
classes. e.g. an XCMSParam
class with a parameter class for peak detection, one (optional) for alignment and one for correspondence. quantify
with such a parameter class would then apply all the necessary steps sequentially and return the object with the quantification data (not necessarily an XCMSnExp
object, see below).
Re re-using the XCMSnExp
for you purpose depends on what you want to have. I think you want to have an object with a two-dimensional quantification matrix - that's not in XCMSnExp
. What if you would define a similar object (just with a two-dimensional quantification matrix) and quantify
would turn an XCMSnExp
into your object (dropping all the information about individual chromatographic peaks etc)?
The idea of having a single quantify method to do all of this might be tricky.
That wasn't my intention. Now that I understand better, I would see something like
xcms
: OnDiskMSnExp
-(step1)-(step2)-(step3)->
XCMSnExp
MSnbase
: OnDiskMSnExp
-quantify->
SomethingMSnExp
(that basically does the same as now, but keeping quantitation and raw data together, at least temporarily).MSnbase
using xcms
: to be determined, but probably 3 steps with intermediates, and possibly with step 3 being quantify
, or a ParamsList
, or 1 quantify
step using YamsParam
, leading to SomethingMSnExp
Re re-using the XCMSnExp for you purpose depends on what you want to have. I think you want to have an object with a two-dimensional quantification matrix - that's not in
XCMSnExp
.
No, the 2-dimensional object is an MSnSet
, which is here to stay. I want an OnDiskMSnExp
that also contains a matrix
with quantified features in SomethingMSnExp
. Then, one can look and the quantified features and raw data easily and, when ready, proceed with only quantitative data using as(mySomethingMSnExp, "MSnSet")
.
I hope this clarifies what I am looking for. I think these needs are essentially what you have.
Looks good to me! I think the ParamList
containing the parameter classes for the 3 steps is a not too bad idea (could be called xcmsParam
). And the YamsParam
that does the quantification in one go. I like the idea behind yams and that it takes a different approach than xcms. Only caveat is that the data should be nicely aligned for yams.
Also for me it might be helpful to have then a as(XCMSnExp, "MSnSet")
to get rid of the raw data in the end. Good that we begin to merge metabolomics and proteomics workflows/classes etc. The next thing would then be the identification of the features - for an initial identification based on m/z I've started something over at https://github.com/EuracBiomedicalResearch/CompoundDb . It's a simple database for compounds.
Also for me it might be helpful to have then a
as(XCMSnExp, "MSnSet")
to get rid of the raw data in the end.
That was implied in my as(SomeethingMSnExp, "MSnSet")
- as SomethingMSnExp
would essentially be an XCMSnExp
, i.e. an OnDiskMSnExp
with a feature/quantitation environment.
If this looks good an feasible, my next concrete question is:
XCMSnExp
to MSnbase
and possibly renaming it FeatureMSnExp
, QuantMSnExp
, ... and use it as XCMSnExp
in xcms
. I can't do it the other way due to circular dependencies.Or do you see another possibility?
Same for the Params
class, but that's more of a detail now.
Good that we begin to merge metabolomics and proteomics workflows/classes etc
Yes!
I really like the idea to merge metabolomics and proteomics workflows and to use a QuantitationParam
class for the quantitation settings.
I want an
OnDiskMSnExp
that also contains amatrix
with quantified features inSomethingMSnExp
While it would be great to link quantitation data to raw data I am against the MonsterMSnExp
that contains just everything. IMHO it works against our idea we discussed at EuroBioc2017 that we want to develop an universal infrastructure to link quant data to raw data to sequence data to genomic data etc. I think it would be better to keep small specialized objects ((OnDisk)MSnExp
, MSnSet
).
BTW: Do we want to start a repository for this linkage stuff to discuss goal/problems from time to time? (Starting point could be the document you wrote at EuroBioc2017.)
What about moving XCMSnExp to MSnbase and possibly renaming it FeatureMSnExp, QuantMSnExp, ... and use it as XCMSnExp in xcms. I can't do it the other way due to circular dependencies.
In principle there shouldn't be a big problem moving the object - there will be stuff that you have to implement in MSnbase
(e.g. how to add or access the matrix with quantification data) and stuff that I would like to keep in xcms
(all the stuff related to adjusted retention times etc). We would have to define beforehand what has to be moved (depends on what you need in MSnbase
). One thing regarding the name: the idea was that you have the MSnExp
object for mass spec proteomics and the XCMSnExp
the extension for LC/GC-MS (hence XC
+ MSnExp
).
No problem moving the Param
and GenericParam
classes.
Re @sgibb about the monster object: for me it was key to have both the link to the raw and quantified data in the same object. I really need to get back to exactly the same data on which the quantification was performed to check if the e.g. identified peaks are well aligned across samples etc. Having that in different objects would make it very hard (this is how it was in the old xcms
and the main reason why I wanted to change and update it).
@sgibb
I am against the
MonsterMSnExp
In principle, I agree. But at this point, given that xcms
has taken that route (for very sensible reasons) and has something that suites my immediate needs, I think it is a reasonable path for MSnbase
too. It could be also be done with two objects, making sure they have compatible dimensions. But at this stage, having matching metabolomics and proteomics pipelines is a desirable feature.
The long-term goals, as discussed at EuroBioc2017, is to integrate raw data, quantitation, protein database, peptide sequences, ... and I don't want to rush this. The r-bioc-ms-challenges repo can be used for discussions.
I am open for discussion though. I will think about keeping MSnSet
and OnDiskMSnExp
separate and maintain the dimensions compatible, which isn't the case now as the letter stores all MS levels.
We would have to define beforehand what has to be moved (depends on what you need in
MSnbase
).
I would have thought the class and accessors would suffice, and all the xcms
specific processing would stay, of course.
Re naming, I got the reason of the XC
, but it is confusing due to xcms
. Would it be possible to name it something more general in MSnbase
and use it as XCMSnExp
in xcms
? Something along the lines of
name1 <- name2 <- ...
for functions. XCMSnExp
wouldn't be exported from MSnbase
, only from xcms
.
Is it possible to have the same object with two different names? Also here, I don't have any strong feeling regarding the name of the object - XCMSnExp
is very hard to pronounce anyway.
Why I'm not euphoric about moving the object is that, as it is now, it is hardly of any use for you - because you don't need the adjusted retention times, the matrix of identified chromatographic peaks and the DataFrame
with feature definitions. What you would need is a matrix with quantitation data, which the object at present does not have. Wouldn't it be easier to implement a new object? Coercing from an XCMSnExp
to that would then be easy, because the matrix
returned by featureValues,XCMSnExp
would be the quantification matrix.
If you're already thinking that you will need the object to implement the label free identification in MSnbase
- for that you would need to import then all the analysis methods from xcms
and you'll end up in circular dependencies. I think that should be implemented in xcms
.
Is XCMSnExp
not an OnDiskMSnExp
with the MsFeatureData
environment (and a list for the processing history)?
In you case, the environment contains adjusted retention times, the matrix of identified chromatographic peaks and the DataFrame
with feature definitions - accessed with the dedicates accessors; in my case that environment would contain a matrix accessed by exprs
.
Basically, I would do
setClass("FeatureMSnExp",
slots = c(
.processHistory = "list", ## optional
msFeatureData = "MsFeatureData"
),
contains = c("OnDiskMSnExp"))
setMethod("[", "FeatureMSnExp",
## also subset object@msFeatureData$exprs
)
setMethod("exprs", "FeatureMSnExp",
function(object) object@msFeatureData$exprs)
setMethod("show", "FeatureMSnExp",
## just say that there's something in the msFeatureData slot,
## possibly say what it is, then call show,OnDiskMSnExp
)
## all the other accessors would be inherited form OnDiskMSnExp
For a later tighter integration, I think it would be good that XCMSnExp
and FeatureMSnExp
wouldn't be completely independent. Let's sleep over it and think about it again. I can work around this for now anyway.
Ah, and here comes the trouble:
I have a [
method implemented that we would have to adopt:
#' @description The \code{[} method allows to subset a \code{\link{XCMSnExp}}
#' object by spectra. Be aware that the \code{[} method removes all
#' preprocessing results, except adjusted retention times if
#' \code{keepAdjustedRtime = TRUE} is passed to the method.
#'
#' @param x For \code{[} and \code{[[}: an \code{\link{XCMSnExp}} object.
#'
#' @param i For \code{[}: \code{numeric} or \code{logical} vector specifying to
#' which spectra the data set should be reduced.
#' For \code{[[}: a single integer or character.
#'
#' @param j For \code{[} and \code{[[}: not supported.
#'
#' @param drop For \code{[} and \code{[[}: not supported.
#'
#' @rdname XCMSnExp-filter-methods
setMethod("[", "XCMSnExp", function(x, i, j, ..., drop = TRUE) {
if (!missing(j))
stop("subsetting by columns ('j') not supported")
if (missing(i))
return(x)
else if (!(is.numeric(i) | is.logical(i)))
stop("'i' has to be either numeric or logical")
## Check if we have keepAdjustedRtime as an additional parameter
## in ...
keepAdjustedRtime <- list(...)$ke
if (is.null(keepAdjustedRtime))
keepAdjustedRtime <- FALSE
if (hasFeatures(x) | hasChromPeaks(x)) {
suppressMessages(
x <- dropFeatureDefinitions(x, keepAdjustedRtime =
keepAdjustedRtime))
suppressMessages(
x <- dropChromPeaks(x, keepAdjustedRtime =
keepAdjustedRtime))
warning("Removed preprocessing results")
}
if (hasAdjustedRtime(x)) {
if (keepAdjustedRtime) {
## Subset the adjusted rtime
new_adj <- rtime(x, adjusted = TRUE)[i]
newFd <- new("MsFeatureData")
newFd@.xData <- .copy_env(x@msFeatureData)
adjustedRtime(newFd) <-
unname(split(new_adj, f = fromFile(x)[i]))
lockEnvironment(newFd, bindings = TRUE)
x@msFeatureData <- newFd
} else {
suppressMessages(x <- dropAdjustedRtime(x))
}
}
callNextMethod()
})
so, if there is a exprs
subset that, and then call all of the other subsettings above.
same is true for the filter...
methods. For XCMSnExp
they are all overwriting the OnDiskMSnExp
implementation to ensure that the data remains consistent. To me it would make sense to keep these methods in xcms
or eventually override the ones exported from MSnExp
. All this stuff that I have to do to keep the adjusted retention times, identified chromatographic peaks and feature definitions aligned and consistent.
Actually, what if we keep the XCMSnExp
that extends your FeatureMSnExp
object? That way we could keep the same methods and keep the logic (i.e. to subset the exprs
and/or all the xcms stuff) separate?
Here's what I was thinking of: we would have a top class in MSnbase
setClass("FeatureMSnExp",
slots = c(
.processHistory = "list", ## optional
msFeatureData = "MsFeatureData"
),
representation = "VIRTUAL",
contains = c("OnDiskMSnExp"))
## possibly a show,FeatureMSnExp method
## common infrastructure inherited from OnDiskMSnExp
and in MSnbase
setClass("MSnExpSet", contains = "FeatureMSnExp") ## horrible name
setMethod("[", "MSnExpSet", ...) ## accessing exprs
## all the specific proteomics processing methods/functions
and in xcms
setClass("XCMSnExp", contains = "FeatureMSnExp")
setMethod("[", "XCMSnExp", ...) ## accessing other elements
## all the specific metabolomics processing methods/functions
This allows us to keep a link between the MSnbase
and xcms
specific implementations of FeatureMSnExp
, yet these two having different data in the MsFeatureData
slot and different accessors. I am just unsure yet to what extend this will be useful. But I don't think it will make anything more difficult.
Let's see tomorrow if this still makes sense...
Looks good to me. So, you're moving the MsFeatureData
to MSnbase
, right?
So, you're moving the
MsFeatureData
toMSnbase
, right?
From the above, I rather meant FeatureMSnExp
(a renamed virutal XCMSnExp
) and MsFeatureData
, and we can implement our respective XCMSnExp
(with all it already has) and SomethingMSnExp
(with exprs
).
I won't have time to further test/think about this today, so no hurry.
I am going to wait before doing this, that I have the new interface with QuantitationParam
in place.
Given the new
OnDiskMSnExp
infrastructure, it is time to upgrade thequantify
method. I am considering the following signature (only focusing on the most important arguments for now):with the following behaviour:
Another possibility would be to have a
QuantitationParams
class (with essentially the arguments shown above) and pre-defined instances for the most used methods. At this point it feels overkill, but could be a way to more easily extend quantification parametrisation in the future. This could by the way also be useful forxcms
.Another point I am considering is to allow to either returns an
MSnSet
containing the feature of the MS level defined as argument toquantify
(current behaviour), or to return a newOnDiskMSnExp
and the quantification matrix (for all feature) would be stored in amatrix
in theassayData
slot, similar to whatxcms
does in theXCMSnExp
object.@sgibb @jotsetung - any comments or suggestions?