sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

findChromPeaks finds duplicate peaks #695

Open Pascallio opened 11 months ago

Pascallio commented 11 months ago

Hi,

I've been following the tutorial on BioConductor: link, but after peak picking, I've noticed that findChromPeaks returns duplicate peaks:


library(xcms)

## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
            recursive = TRUE)[c(1, 2, 5, 6, 7, 8, 11, 12)]

## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
                                   replacement = "", fixed = TRUE),
                 sample_group = c(rep("KO", 4), rep("WT", 4)),
                 stringsAsFactors = FALSE)

# Read the raw data
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")

# Filter for a smaller subset
raw_data <- filterRt(raw_data, c(2500, 3500))

# Set parameters for Peak Picking
cwp <- CentWaveParam(peakwidth = c(20, 80), noise = 5000,
                     prefilter = c(6, 5000))

# Perform peak picking and save results
data <- findChromPeaks(raw_data, param = cwp)

# Retrieve peak data as a data.frame
peaks <- as.data.frame(chromPeaks(data))

# Find unique combinations of mass, retention time and sample number
uniqueComb <- paste(peaks$mz, peaks$rt, peaks$sample)

# Find duplicates
isDuplicated <- duplicated(uniqueComb)

# Get all rows that are duplicated
duplicates <- uniqueComb[isDuplicated]

# Get all peaks that have the unique combination in the duplicates
duplicatePeaks <- peaks[uniqueComb %in% duplicates, ]

# Print the duplicate peaks
duplicatePeaks

Here's a sample of the output: image

Interestingly, while the m/z, rt and into are equal, the intb is not. I've tested this on the same version as BioConductor: 3.22, but also on the GitHub 3.99.5 version.

Best, Pascal

jorainer commented 11 months ago

Hi, yes, that is a known issue of centWave - it happens for some data sets. I did however not figure out where in the code this actually happens. What I usually do here is to run the refineChromPeaks with the MergeNeighboringPeaksParam after peak detection. That removes/fuses duplicated peaks. Maybe also have a look into the xcmsTutorials for a description.