sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

Implement a modified centWave method #136

Open jorainer opened 7 years ago

jorainer commented 7 years ago

Implement a modified centWave method that fixes issue #135 , i.e. the reported values in column "into" are really the integrated intensities from the chromatographic peak area (defined by the columns "mzmin", "mzmax", "rtmin" and "rtmax").

jorainer commented 7 years ago

The modified centWave function is identical to the original function except: 1) The mz range of the peaks is calculated only using mz values with a measured intensity. For some mz-rt pairs no intensity is measured, but these are still returned by the getEIC and getMZ C functions (with an intensity of 0). This avoids that some mz ranges are ranging from 0 to max mz (0 being the reported mz for the before mentioned rt-mz pairs without measured signal). 2) The intensities for each peak are reloaded before integrating the signal. This ensures that "into" corresponds to the integrated signal only for the peak area. In the original version the whole mz range of the ROI was used (see issue #135).

Presently it is possible to switch between the centWave functions using options(originalCentWave = FALSE) (uses modified version).

jorainer commented 7 years ago

Exhaustive tests comparing the two versions are in dontrun_exhaustive_original_new_centWave_comparison in the runit.do_findChromPeaks_centWave.R file. This checks centWave for 4 different files, ko15.CDF, MM14.mzML, MM8.mzML (all from msdata) and one of my own files changing centWave settings for each file. The summary:

The biggest differences were for one of my files: settings CentWaveParam(ppm = 40, peakwidth = c(1, 60)). The original centWave identified 5746 peaks, the modified 7273. All of the 5746 identified by the original centWave were also identified by the modified centWave. The correlation of "into" values was very large (R > 0.98; for the plot see below). into-correlation

jorainer commented 7 years ago

The modified centWave looks OK for standard centWave peak detection (actually identifying more peaks) it fails for centWave with predicted isotope ROIs.

Specifically, the do_findChromPeaks_addPredIsoROIs, that performs the chromatographic peak detection in ROIs corresponding to predicted isotopes for the initially identified peaks, fails to identify any peaks. Or actually, it identifies much more peaks, but all of them are removed because their mz width is 0.

In a test run on ko15.CDF, 7497 isotope ROIs are evaluated. The original centWave identifies 234 peaks in these and removes 103 of them because of an mz width of 0. The modified version identifies 1723 peaks, all of which are however removed because they have an mz width of 0.

Now, is this step to remove peaks based on an mz width of 0 really required @sneumann @Treutler ?