sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
186 stars 80 forks source link

Different amount of peaks with old and new interface? #402

Open rickhelmus opened 5 years ago

rickhelmus commented 5 years ago

Hi XCMS team,

I finally got around trying the and improved XCMS3 interface :-) One thing I quickly noticed is that with some data files I get significantly different amounts of peaks. A small example is shown below.

library(xcms)

# remotes::install_github("rickhelmus/patRoonData")
files <- list.files(patRoonData::exampleDataPath(), full.names = TRUE)

xs <- xcmsSet(files, method = "centWave")

MSData <- MSnbase::readMSData(files, msLevel. = 1L, mode = "onDisk")
xdata <- findChromPeaks(MSData, CentWaveParam())
xdata2 <- findChromPeaks(MSData, CentWaveParam(), return.type = "xcmsSet")

The old xcmsSet function returns less than half of the peaks compared to the new interface (401 vs 911). As far as I could tell peak picking defaults remained the same, perhaps it's related to the different binning functions? In the end I am just curious if this is a bug or feature? :-)

Outputs:

> xs
An "xcmsSet" object with 6 samples

Time range: 14-599.8 seconds (0.2-10 minutes)
Mass range: 84.9595-299.1494 m/z
Peaks: 401 (about 67 per sample)
Peak Groups: 0 
Sample classes: extdata 

Feature detection:
 o Peak picking performed on MS1.
Profile settings: method = bin
                  step = 0.1

Memory usage: 0.251 MB
> xdata
MSn experiment data ("XCMSnExp")
Object size in memory: 3.88 Mb
- - - Spectra data - - -
 MS level(s): 1 
 Number of spectra: 14167 
 MSn retention times: 0:0 - 9:60 minutes
- - - Processing information - - -
Data loaded [Fri Aug 16 11:56:53 2019] 
Filter: select MS level(s) 1 [Fri Aug 16 11:56:53 2019] 
 MSnbase version: 2.10.1 
- - - Meta data  - - -
phenoData
  rowNames: solvent-1.mzML solvent-2.mzML ... standard-3.mzML (6 total)
  varLabels: sampleNames
  varMetadata: labelDescription
Loaded from:
  [1] solvent-1.mzML...  [6] standard-3.mzML
  Use 'fileNames(.)' to see all files.
protocolData: none
featureData
  featureNames: F1.S0001 F1.S0002 ... F6.S2363 (14167 total)
  fvarLabels: fileIdx spIdx ... spectrum (33 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
- - - xcms preprocessing - - -
Chromatographic peak detection:
 method: centWave 
 911 peaks identified in 6 samples.
 On average 152 chromatographic peaks per sample.
> xdata2
An "xcmsSet" object with 6 samples

Time range: 11.1-599.8 seconds (0.2-10 minutes)
Mass range: 91.0541-299.1494 m/z
Peaks: 911 (about 152 per sample)
Peak Groups: 0 
Sample classes: solvent-1.mzML, solvent-2.mzML, solvent-3.mzML, standard-1.mzML, standard-2.mzML, standard-3.mzML 

Feature detection:
 o Peak picking performed on MS1.
 o Scan range limited to  1 - 2667 
Profile settings: method = bin
                  step = 0.1

Memory usage: 2.31 MB

I am using R 3.6.1 with fully updated Bioc/R packages.

Cheers, Rick

jorainer commented 5 years ago

That is indeed unexpected! I will have a look into it.

sneumann commented 5 years ago

Hi, I was able to reproduce the numbers in the example. I didn't find an explanation. Yours, Steffen

jorainer commented 5 years ago

Getting closer to the explanation: reading just one file and comparing the content of the data. The xcmsSet call reads each file with xcmsRaw and performs peak detection on that.

xr <- xcmsRaw(files[1])
msd1 <- readMSData(files[1], msLevel = 1L, mode = "onDisk")

## Comparing the number of spectra:
length(xr@scantime)
2358
length(rtime(msd1))
2629

Apparently, the xcmsRaw has fewer scans/spectra than what is available when reading the data with readMSData. In fact, xcmsRaw has an option dropEmptyScans = TRUE which removes spectra without any peaks (because this can not be represented by the xcmsRaw data structure).

To verify that the difference in spectra is due to that:

length(msd1) - sum(peaksCount(msd1) == 0)
2358

I will next check how this affected the peak detection, but I guess the explanation might be in this difference.

jorainer commented 5 years ago

In fact, when the empty spectra are removed from the new object the results of centWave are identical:

tmp <- msd1[peaksCount(msd1) > 0]
res_new <- findChromPeaks(tmp, CentWaveParam())
res_old <- findPeaks.centWave(xr)

all.equal(res_old[, "mz"], unname(chromPeaks(res_new)[, "mz"]))
TRUE
all.equal(res_old[, "into"], unname(chromPeaks(res_new)[, "into"]))
TRUE