sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

Ability to merge peak detection results #387

Open zizekw opened 5 years ago

zizekw commented 5 years ago

Based off of discussion here #315 and the request to create a separate feature request post.

Without the use of grouping + fillChromPeaks(), a simple + integrated way to perform a targeted second pass of peak detection for missed peaks with the ability to merge this with the initial object (containing peaks from first pass of detection) might be useful new functionality into the xcms ecosystem.

Perhaps for the CentWave algorithm, this would involve building off of the functionality for pre-defining ROI? An example use case for our research group would be if CentWave identified 95%+ of high quality peaks, then to use this functionality as a means to go back and reincorporate features that were clearly missed at the discretion of the graduate student. (Fine tuning CentWave params got us so far but not 100% of the way!)

Hope that is clear/makes sense. My developer skills are weak/non-existent but please let me know how I can contribute if needed!

zizekw commented 4 years ago

Just linking to #432 as this would be a mutual solution being able to merge XChromatogram files. I'll take another crack at playing around with this as well to see if I can get a solution working.

jorainer commented 4 years ago

Sounds indeed interesting (as we also now run into similar issues). In theory one would only need to add new rows to the chromPeaks matrix (and the chromPeakData DataFrame). I'm just not yet sure where these should come from. One possibility would be to manually define these.

What would be a typical use case that you have?

zizekw commented 4 years ago

Sorry for the late response! Yes exactly, that's essentially what we're doing but in a roundabout way.

Pseudocode for our ideal workflow is:

rawData <- readMSData(files = mzML_profile_data, mode = "onDisk")

mz1 <- c(141.1385, 141.1485) 
mz2 <- c(166.1659, 166.1759
mz3 <- c(184.0718, 184.0818)
mz_tibble <- tibble(mz1,mz2,mz3)

for (i in seq_along(mz_tibble) {
Chromatograms <- rawData %>%
    filterMz(mz = as.numeric(flatten(mz_tibble[i]))) %>% 
    chromatogram()

XChromatograms <- findChromPeaks(Chromatograms, param = params$CentWave)
}
# Or alternatively if you save the XChromatograms objects separately, calling something like the following:
# XChromatograms_big <- c(XChromatograms1, XChromatograms2, XChromatograms3)`

The idea is that we are creating an XChromatograms object for specific compounds across all our data files. Then, we're not sure what the optimal way to merge everything back together is. Ideally, we'd also like to run this in parallel processes - but I’m not quite sure how to do that on Windows.

By the way for context, we're doing this because we found this general approach is ideal for our targeted TOF data analysis in select cases to reliably identify low intensity peaks.

What do you think? Does this seem like a suitable approach?