Closed cbroeckl closed 2 years ago
One more comment. I was using combineSpectra from msnbase, not xcms. I am assuming they share code though!
Hm, the problem is that both MSnbase
and Spectra
define a combineSpectra
function - that actually work slightly different... to be really on the safe side to use the more powerful function from the Spectra
package you could call it directly with Spectra::combineSpectra
- otherwise it will depend on the order you load the packages which one will be used.
Next question, with the tutorial you mean this one, right? Just to be really sure which code to look for and where to add fixes or additional features... (again, I would prefer to do this in the Spectra
package, which is also used in the LC-MS/MS xcms
vignette)
@jorainer - my apologies. i realized i was using two different functions late in the game. i had been using msnbase, observed the discrepancy between the performance of sum and mean on my data, then went to try to make a reproducible example from the Vignette. I received error messages trying to use the Spectra::combineSpectra function having parsed data with MSnBase, so i will need to back up to do this more correctly.
you did identify the correct vignette though. When you get to this code snippet: ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20, peaks = "intersect", minProp = 0.8, intensityFun = median, mzFun = median, f = ex_spectra$peak_id)
just replace 'intensityFun = median' with 'intensityFun = mean' or 'intensityFun = sum'
More later, but the mean and sum from Spectra::combineSpectra do look slightly different in their relative abundances. maybe this is just rounding? I will have to see whether the larger differences that i observed on my own data are observed using the Spectra::combineSpectra function.
Also - question does remain as to whether there are functions for subtracting a spectrum from another. Alternatively, is it possible to align spectra without combining them? I could then take the @mz and @intensity data from each, do whatever math i want, and recreate a spectra-class object. This way i would still be using the Spectra alignment tools, at least.
@jorainer - i am a bit stuck. I can't seem to figure out how to get Spectra::combineSpectra to work using the filtering i am using, though i can get the MSnbase::combineSpectra to work. I use the SWATH demo data as an example:
> ## swath demo data to demonstrate issue
> swath_file <- system.file("TripleTOF-SWATH",
+ "PestMix1_SWATH.mzML",
+ package = "msdata")
> swath_data <- MSnbase::readMSData(swath_file, mode = "onDisk")
> swath_data <- Spectra::filterRt(swath_data, rt = c(310, 312))
> table(Spectra::msLevel(swath_data))
1 2
2 18
>
> ms2 <- Spectra::filterMsLevel(swath_data, 2)
> ms2
MSn experiment data ("OnDiskMSnExp")
Object size in memory: 0.04 Mb
- - - Spectra data - - -
MS level(s): 2
Number of spectra: 18
MSn retention times: 5:10 - 5:12 minutes
- - - Processing information - - -
Data loaded [Fri Jul 30 13:49:22 2021]
Filter: select retention time [310-312] and MS level(s), 1 2 [Fri Jul 30 13:49:22 2021]
Filter: select MS level(s) 2. [Fri Jul 30 13:49:22 2021]
MSnbase version: 2.18.0
- - - Meta data - - -
phenoData
rowNames: PestMix1_SWATH.mzML
varLabels: sampleNames
varMetadata: labelDescription
Loaded from:
PestMix1_SWATH.mzML
protocolData: none
featureData
featureNames: F1.S3100 F1.S3101 ... F1.S3119 (18 total)
fvarLabels: fileIdx spIdx ... spectrum (35 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
>
> ## the combineSpectra function from the Spectra package doesn't work, apparently
> ms2.cmb <- Spectra::combineSpectra(ms2)
Error in Spectra::combineSpectra(ms2) :
length of 'f' and 'p' have to match length of 'x'
>
> ## defining f and p factor vectors generates a new error message - not sure
> ## what this signifies though.
> ms2.cmb <- Spectra::combineSpectra(ms2[1:18],
+ f = as.factor(rep("A", 18)),
+ p = as.factor(rep("A", 18))
+ )
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘isReadOnly’ for signature ‘"character"’
>
>
> ## this is what i see when applying the MSnbase::combineSpectra - same as i observed
> ## for my internal data - vastly different spectra, when i would expect different
> ## (by a factor of nSpectra) y-axis values only.
> ms2.cmb.mean <- MSnbase::combineSpectra(ms2, ppm = 25, intensityFun = mean)
> ms2.cmb.sum <- MSnbase::combineSpectra(ms2, ppm = 25, intensityFun = sum)
>
> pl.mean <- plot(ms2.cmb.mean[[1]], plot = FALSE) + ggtitle("mean intensity")
> pl.sum <- plot(ms2.cmb.sum[[1]], plot = FALSE) + ggtitle("sum intensity")
> gridExtra::grid.arrange(pl.sum, pl.mean, nrow = 2)
I do realize that this may be better reported at MSnbase as it stands. However, I do want to figure out whether i see different behavior using the Spectra package, and would appreciate some guidance. i am sure this is a relatively simple issue. Thanks. Corey
Thanks for the use case @cbroeckl ! I'll try to follow it and fix the errors below. Firstly, you can not apply the functions from the Spectra
package if you read the data with the MSnbase
readMSData
function. You should either import them directly using Spectra
or use, as in the xcms
LC-MS/MS vignette, return.type = "Spectra"
in the e.g. chromPeakSpectra
function from xcms
. That way you have the data as a Spectra
object and you can use the Spectra::combineSpectra
- if you have the data imported with MSnbase
and have it as an MSnExp
object or MSpectra
, you can only use the MSnbase::combineSpectra
function. If possible, I would use in future wherever possible the Spectra
objects, as we have more powerful functions for them - and will most likely not add or update any functionality to MSnbase
(unless its a bug fix).
That said, I try to run your workflow above with Spectra
:
library(Spectra)
swath_file <- system.file("TripleTOF-SWATH", "PestMix1_SWATH.mzML", package = "msdata")
swath_data <- Spectra(swath_file)
swath_data <- filterRt(swath_data, rt = c(310, 312))
ms2 <- filterMsLevel(swath_data)
ms2
MSn data (Spectra) with 20 spectra in a MsBackendMzR backend:
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 2 310.070 3100
2 2 310.167 3101
... ... ... ...
19 2 311.870 3118
20 2 311.967 3119
... 33 more variables/columns.
file(s):
PestMix1_SWATH.mzML
Processing:
Filter: select retention time [310..312] on MS level(s) 2 1 [Mon Aug 2 12:40:57 2021]
Filter: select MS level(s) [Mon Aug 2 12:41:21 2021]
OK, and now let's use combineSpectra
grouping all peaks from all spectra with each other that haven an m/z difference smaller 20ppm. We use either sum
or mean
to aggregate the intensities.
ms2_sum <- combineSpectra(ms2, intensityFun = sum, ppm = 20)
ms2_mean <- combineSpectra(ms2, intensityFun = mean, ppm = 20)
## Plot the results
par(mfrow = c(2, 1), mar = c(4, 4.3, 1, 0.5))
plotSpectra(ms2_sum)
grid()
plotSpectra(ms2_mean)
grid()
to me that looks correct, all the main peaks seem to be there and the sum has higher intensities than the mean.
Now, regarding your question with the difference of intensities... that should be doable. With intensityFun
we can actually define any function we want to aggregate the intensities of the peaks from different spectra that are matched (based on their m/z). It's just important that a single value is returned for each grouped peak.
Just using diff
would not work because of that reason:
## Subset to only 2 spectra:
ms2_2 <- ms2[1:2]
ms2_2diff <- combineSpectra(ms2_2, intensityFun = diff, ppm = 20)
Backend of the input object is read-only, will change that to an 'MsBackendDataFrame'
Error in vapply(X = X, FUN = FUN, FUN.VALUE = NA_real_, ..., USE.NAMES = USE.NAMES) :
values must be length 1,
but FUN(X[[1]]) result is length 0
We could maybe create our own function that subtracts the intensities from each other and returns NA
if there is a single intensity value (i.e. peak only found in one spectrum) or there are more than 2 intensity values (e.g. if two peaks from one spectrum and one or more from the other where grouped).
mydiff <- function(x) {
if (length(x) == 2)
x[2] - x[1]
else NA_real_
}
ms2_2diff <- combineSpectra(ms2_2, intensityFun = mydiff, ppm = 20)
plotSpectra(ms2_2diff, ylim = range(intensity(ms2_2diff)[[1L]], na.rm = TRUE))
Note: there seems to be a bug in plotSpectra
that does not expect negative intensity values - that's why we need to specify the ylim
ourselfs.
You see, in theory combineSpectra
should be flexible enough to do what you suggest. Also, maybe have a look at the documentation of ?combinePeaks
which is the function used internally to do the actual combining.
@jorainer
Many thanks - this is a great series of responses. A few replies:
1) I know, that's tricky. I'll try to update the vignettes (like for xcms
) whenever new functionality becomes available. Most development efforts go now into Spectra
and alike but in the end I try to integrate as much as possible with the old packages as possible (or have a way to convert old data objects into the new data structures to allow usage of the new code).
2) if something is not working we could implement alternative functions too - we would just need a nice use case.
3) yes, the sum and mean spectra should identical, but only if the individual spectra are also identical. For the example above it seems that some peaks where present in most spectra, but some not, thus the mean and sum resulted in differences. What the function does is pretty simple: it gets all peaks from all spectra and orders them by m/z. Then it groups all peaks (based on m/z similarity considering the provided tolerance
and ppm
) and aggregates the m/z and intensity values in each such defined peak group. So, there is no pairwise calculation in there, so the result should be correct (I believe).
Follow up on (3). So if we have three spectra, and each spectra has m/z = 195.088, the signal intensity for it is calculated as the sum of the signal intensities / three. However, if the 195.088 is present in only two of the three spectra, then the sum is calculated and divided by two, as there were only two measured values. Accurate?
I had been thinking of the divisor for the mean as always being the number of original spectra. I think you are treating the absent mass/intensity as a missing value to be ignored, while i was thinking about it as a 'zero' value intensity at that mass. If we treat missing values as zeros, then the divisor is always the number of spectra.
I am not sure whether one is more appropriate than the other - i imagine that it depends on the context, to some extent.
correct, the way it is implemented we only take the mean on the present values and if in one or more spectra the peak was not present this is not included in the calculation. We could think of an alternative implementation if this would be important for you. I will however definitely update the documentation to explain that better. Thanks Corey!
Thanks @jorainer - i think it is fine as is for now - if i need the mean calcluated otherwise i can always write the function to divide the sum by nSpectra for each mass. I just misunderstood the calculations.
I have been playing with some DIA data, trying to reconstruct filtered MS/MS spectra. I walked through the MS/MS DDA tutorial, and have played with some of my own data.
Question 1: Is there a way to subtract spectra? i.e. is there a function to subtract the signal from MS/MS spectrum A from MS/MS spectrum B? Alternatively, take a ratio?
Question 2: The combineSpectra function does not seem to behaving as i would expect. In particular the intensityFun aggregation. I noticed this clearly on some in-house data, but rather than try to share that, you can you use the tutorial data. Once you get to the combineSpectra function for the 5 DDA spectra:
ex_spectrum.sum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20, peaks = "intersect", minProp = 0.8, intensityFun = sum, mzFun = median, f = ex_spectra$peak_id)
ex_spectrum.mean <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20, peaks = "intersect", minProp = 0.8, intensityFun = mean, mzFun = median, f = ex_spectra$peak_id)
In theory, these two spectra, sum and mean, should be identical except for some scaling factor based on n = 5. However, they are not. There are small differences in the relative abundances.
Look at the ratio of the two highest-mass ions, for example.
It is considerably worse in my in-house data, which is using waters SONAR - many more spectra are being combined here:
top is 'sum' and bottom is 'mean'
Any idea why this is the case?
Thanks! Corey