sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

[discrepancy] correlate() vs compareChromatograms() #592

Closed Adafede closed 1 year ago

Adafede commented 2 years ago

Hi,

For some context, I am trying to compare peak shapes coming from different detectors in order to group them.

I started with https://bioconductor.org/packages/3.15/bioc/vignettes/xcms/inst/doc/xcms-lcms-ms.html#32_Reconstruction_of_MS2_spectra, which offers the nice correlate function. It is what I am looking for. As I ran it, it suggested using the new compareChromatogramsfunction, I did it but found some discrepancies without really understanding where they come from. I looked at the related work done on EIC grouping by @jorainer but still have no idea where the origin of the problem is, probably a slight difference in XChromatograms and MChromatograms I do not get.

For reproducing what I am talking about:

library(xcms)
library(Spectra)
## ----load-swath-data, message = FALSE-----------------------------------------
swath_file <- system.file("TripleTOF-SWATH",
                          "PestMix1_SWATH.mzML",
                          package = "msdata")

swath_data <- readMSData(swath_file, mode = "onDisk")

## ----echo = FALSE, message = FALSE--------------------------------------------
swath_data <- filterRt(swath_data, rt = c(200, 600))

## ----swath-table-mslevel------------------------------------------------------
table(msLevel(swath_data))

## ----fdata-isolationwindow----------------------------------------------------
head(fData(swath_data)[, c("isolationWindowTargetMZ",
                           "isolationWindowLowerOffset",
                           "isolationWindowUpperOffset",
                           "msLevel", "retentionTime")])

head(isolationWindowLowerMz(swath_data))
head(isolationWindowUpperMz(swath_data))

## -----------------------------------------------------------------------------
table(isolationWindowTargetMz(swath_data))

## ----find-chrom-peaks-ms1, message = FALSE------------------------------------
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
                     peakwidth = c(3, 30))
swath_data <- findChromPeaks(swath_data, param = cwp)

## ----find-chrom-peaks-ms2, message = FALSE------------------------------------
cwp <- CentWaveParam(snthresh = 3, noise = 10, ppm = 10,
                     peakwidth = c(3, 30))
swath_data <- findChromPeaksIsolationWindow(swath_data, param = cwp)

## -----------------------------------------------------------------------------
chromPeakData(swath_data)

## -----------------------------------------------------------------------------
table(chromPeakData(swath_data)$isolationWindow)

## ----fena-extract-peak--------------------------------------------------------
fenamiphos_mz <- 304.113077
fenamiphos_ms1_peak <- chromPeaks(swath_data, mz = fenamiphos_mz, ppm = 2)
fenamiphos_ms1_peak

## ----fena-identify-ms2--------------------------------------------------------
keep <- chromPeakData(swath_data)$isolationWindowLowerMz < fenamiphos_mz &
  chromPeakData(swath_data)$isolationWindowUpperMz > fenamiphos_mz

## ----fena-check-rt------------------------------------------------------------
keep <- keep &
  chromPeaks(swath_data)[, "rtmin"] < fenamiphos_ms1_peak[, "rt"] &
  chromPeaks(swath_data)[, "rtmax"] > fenamiphos_ms1_peak[, "rt"]

fenamiphos_ms2_peak <- chromPeaks(swath_data)[which(keep), ]

## ----fena-eic-extract, warning = FALSE----------------------------------------
rtr <- fenamiphos_ms1_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms1_peak[, c("mzmin", "mzmax")]
fenamiphos_ms1_chr <- chromatogram(swath_data, rt = rtr, mz = mzr)

rtr <- fenamiphos_ms2_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms2_peak[, c("mzmin", "mzmax")]
fenamiphos_ms2_chr <- chromatogram(
  filterIsolationWindow(swath_data, mz = fenamiphos_mz),
  rt = rtr, mz = mzr, msLevel = 2L)

## ----fena-eic-plot, fig.width = 10, fig.height = 5, fig.cap = "Extracted ion chromatograms for Fenamiphos from MS1 (blue) and potentially related signal in MS2 (grey)."----
plot(rtime(fenamiphos_ms1_chr[1, 1]),
     intensity(fenamiphos_ms1_chr[1, 1]),
     xlab = "retention time [s]", ylab = "intensity", pch = 16,
     ylim = c(0, 5000), col = "blue", type = "b", lwd = 2)
#' Add data from all MS2 peaks
tmp <- lapply(fenamiphos_ms2_chr@.Data,
              function(z) points(rtime(z), intensity(z),
                                 col = "#00000080",
                                 type = "b", pch = 16))

## ----fena-cor-----------------------------------------------------------------
## original example from vignette
correlate(fenamiphos_ms2_chr[1, 1],
          fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.9997871
# Warning message:
#   '.local' is deprecated.
# Use 'compareChromatograms' instead.
# See help("Deprecated") 

## just check we are using correlate from xcms
xcms::correlate(fenamiphos_ms2_chr[1, 1],
          fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.9997871
# Warning message:
#   '.local' is deprecated.
# Use 'compareChromatograms' instead.
# See help("Deprecated") 

## OK, let's use compareChromatograms instead, as suggested
compareChromatograms(fenamiphos_ms2_chr[1, 1],
                fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.7019566

## again check if function could be present in different packages
ProtGenerics::compareChromatograms(fenamiphos_ms2_chr[1, 1],
                     fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.7019566

## no diff
MSnbase::compareChromatograms(fenamiphos_ms2_chr[1, 1],
                                   fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.7019566

## quick and dirty trial to try to understand
## building dummy peak a
a <-
  data.frame(
    "rtime" = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
    "intensity" = c(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 34, 21, 13, 8, 5, 3, 2, 1, 1, 0)
  )

## building dummy peak b
b <-
  data.frame(
    "rtime" = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
    "intensity" = c(0, 1.5, 1.7, 2.1, 3.6, 5.2, 8.3, 13.7, 21.1, 34.2, 55.7, 64.8, 32.5, 59.7, 21, 8, 3, 2, 1, 1, 0)
  )

## faking chromatogram class
fake_fenamiphos_ms2_chr <- fenamiphos_ms2_chr
fake_fenamiphos_ms2_chr[1, 1]@rtime <- a$rtime
fake_fenamiphos_ms2_chr[1, 1]@intensity <- a$intensity

## originally started by doing it on 'fenamiphos_ms1_chr', but it is a
## XChromatograms class, which is more restrictive than 'M' and does not allow doing it
fake_fenamiphos_ms1_chr <- fenamiphos_ms2_chr
fake_fenamiphos_ms1_chr[1, 1]@rtime <- b$rtime
fake_fenamiphos_ms1_chr[1, 1]@intensity <- b$intensity

## doing the comparison as before...this time always getting the same result.
## can maybe help finding the hickup?

## correlate
xcms::correlate(fake_fenamiphos_ms2_chr[1, 1],
                fake_fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.8342013
# Warning message:
#   '.local' is deprecated.
# Use 'compareChromatograms' instead.
# See help("Deprecated") 

## no diff
ProtGenerics::compareChromatograms(fake_fenamiphos_ms2_chr[1, 1],
                                   fake_fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.8342013

## no diff
MSnbase::compareChromatograms(fake_fenamiphos_ms2_chr[1, 1],
                              fake_fenamiphos_ms1_chr[1, 1], align = "approx")
# [1] 0.8342013

sessionInfo()
# R version 4.1.2 (2021-11-01)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Monterey 12.2
# 
# Matrix products: default
# LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
# 
# locale:
#   [1] fr_CH.UTF-8/fr_CH.UTF-8/fr_CH.UTF-8/C/fr_CH.UTF-8/fr_CH.UTF-8
# 
# attached base packages:
#   [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] Spectra_1.4.1       xcms_3.16.1         MSnbase_2.20.1      ProtGenerics_1.26.0 S4Vectors_0.32.3   
# [6] mzR_2.28.0          Rcpp_1.0.7          Biobase_2.54.0      BiocGenerics_0.40.0 BiocParallel_1.28.3
# 
# loaded via a namespace (and not attached):
#   [1] MatrixGenerics_1.6.0        vsn_3.62.0                  foreach_1.5.1              
# [4] assertthat_0.2.1            BiocManager_1.30.16         affy_1.72.0                
# [7] GenomeInfoDbData_1.2.7      robustbase_0.93-9           impute_1.68.0              
# [10] pillar_1.6.4                lattice_0.20-45             glue_1.6.0                 
# [13] limma_3.50.0                digest_0.6.29               GenomicRanges_1.46.1       
# [16] RColorBrewer_1.1-2          XVector_0.34.0              colorspace_2.0-2           
# [19] preprocessCore_1.56.0       Matrix_1.4-0                plyr_1.8.6                 
# [22] MALDIquant_1.21             XML_3.99-0.8                pkgconfig_2.0.3            
# [25] zlibbioc_1.40.0             purrr_0.3.4                 scales_1.1.1               
# [28] RANN_2.6.1                  affyio_1.64.0               tibble_3.1.6               
# [31] generics_0.1.1              IRanges_2.28.0              ggplot2_3.3.5              
# [34] ellipsis_0.3.2              SummarizedExperiment_1.24.0 MassSpecWavelet_1.60.0     
# [37] magrittr_2.0.1              crayon_1.4.2                fs_1.5.2                   
# [40] ncdf4_1.19                  fansi_0.5.0                 doParallel_1.0.16          
# [43] MASS_7.3-54                 MsFeatures_1.2.0            tools_4.1.2                
# [46] lifecycle_1.0.1             matrixStats_0.61.0          munsell_0.5.0              
# [49] cluster_2.1.2               DelayedArray_0.20.0         pcaMethods_1.86.0          
# [52] compiler_4.1.2              GenomeInfoDb_1.30.0         mzID_1.32.0                
# [55] rlang_0.4.12                grid_4.1.2                  RCurl_1.98-1.5             
# [58] iterators_1.0.13            MsCoreUtils_1.6.0           bitops_1.0-7               
# [61] gtable_0.3.0                codetools_0.2-18            DBI_1.1.2                  
# [64] R6_2.5.1                    dplyr_1.0.7                 utf8_1.2.2                 
# [67] clue_0.3-60                 parallel_4.1.2              vctrs_0.3.8                
# [70] DEoptimR_1.0-9              tidyselect_1.1.1  

I am happy to help further in case!

Best holiday wishes,

jorainer commented 2 years ago

The compareChromatograms function has different parameters than the correlate function, thus, you can not use it in exactly the same way. To get the same results you need to call:

compareChromatograms(fenamiphos_ms2_chr[1, 1],
                fenamiphos_ms1_chr[1, 1], ALIGNFUNARGS = list(method = "approx"))

So, compareChromatograms performs first a alignment of the chromatograms using the function alignRt and then calculates the similarity. The alignment function can be specified with parameter ALIGNFUN (defaults to alignRt, see ?alignRt for details), parameters to this function can be passed with parameter ALIGNFUNARGS. The similarity calculation function can be defined with parameter FUN (defaults to cor), parameters to this function can be defined with parameter FUNARGS.

This might seem more complicated but it is also more flexible, because (similar to the compareSpectra function) it allows you now to define different ways to align/match the chromatograms and to calculate the similarity.