sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
180 stars 80 forks source link

Error in do_findChromPeaks_addPredIsoROIs: Error in scanrange #545

Closed mwgard closed 3 years ago

mwgard commented 3 years ago

I have been consistently receiving an error when running findChromPeaks-centWaveWithPredIsoROIs as described below. It appears to be due to setting the argument fitgauss = T. The example below is for a single LC-HRAM-MS2 mzML datafile and I've been able to repeat this error using any of my datafiles. It appears to perform the first peak finding with centWave fine, but fails on the second peak finding with the isotope ROIs. I don't absolutely need to set fitgauss = T, but I've typically used it in the past with previous versions of xcms (and other peak finding algorithms). I do not receive this error when I just run findChromPeaks-centWave. I'm guessing the ROIs are maybe being put outside the retention time window of mzML file due to poorly fit gaussian peak shapes. I'll be moving forward setting fitgauss = F, but wanted to let you know about this issue I've been seeing. Thanks

xcms_raw <- MSnbase::readMSData(files = sample_DT$data_filename[1], mode = "onDisk")
xcms_fList <- xcms::findChromPeaks(
    xcms_raw, 
    param = xcms::CentWavePredIsoParam(ppm = 10, prefilter = c(3, 250000),
                                       peakwidth = c(6, 45), maxCharge = 1,
                                       maxIso = 3, mzIntervalExtension = TRUE,
                                       polarity = 'positive', snthresh = 50,
                                       mzdiff = -0.001, fitgauss = FALSE,
                                       noise = 50000, verboseColumns = TRUE,
                                       firstBaselineCheck = TRUE),
    BPPARAM = SerialParam())

Detecting mass traces at 10 ppm ... OK Detecting chromatographic peaks in 2322 regions of interest ... OK: 1722 found. Detecting chromatographic peaks in 1866 regions of interest ... OK: 395 found.

xcms_fList <- xcms::findChromPeaks(
    xcms_raw, 
    param = xcms::CentWavePredIsoParam(ppm = 10, prefilter = c(3, 250000),
                                       peakwidth = c(6, 45), maxCharge = 1,
                                       maxIso = 3, mzIntervalExtension = TRUE,
                                       polarity = 'positive', snthresh = 50,
                                       mzdiff = -0.001, fitgauss = TRUE,
                                       noise = 50000, verboseColumns = TRUE,
                                       firstBaselineCheck = TRUE),
    BPPARAM = SerialParam())

Detecting mass traces at 10 ppm ... OK Detecting chromatographic peaks in 2322 regions of interest ... OK: 1017 found. Error in do_findChromPeaks_addPredIsoROIs(mz = mz, int = int, scantime = scantime, : Error in scanrange

xcms_fList <- xcms::findChromPeaks(
    xcms_raw, 
    param = xcms::CentWaveParam(ppm = 10, prefilter = c(3, 250000),
                                peakwidth = c(6, 45), snthresh = 50, 
                                mzdiff = -0.001, fitgauss = T,
                                noise = 50000, verboseColumns = TRUE,
                                firstBaselineCheck = TRUE),
    BPPARAM = SerialParam())

Detecting mass traces at 10 ppm ... OK Detecting chromatographic peaks in 2322 regions of interest ... OK: 1017 found.

sessionInfo()

R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] tools stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.13.6 readxl_1.3.1 xcms_3.12.0 MSnbase_2.15.7 ProtGenerics_1.22.0 [6] S4Vectors_0.28.1 mzR_2.24.1 Rcpp_1.0.5 BiocParallel_1.24.1 Biobase_2.50.0
[11] BiocGenerics_0.36.0

loaded via a namespace (and not attached): [1] lattice_0.20-41 digest_0.6.27 foreach_1.5.1 R6_2.5.0
[5] GenomeInfoDb_1.26.2 cellranger_1.1.0 plyr_1.8.6 mzID_1.28.0
[9] ggplot2_3.3.3 pillar_1.4.7 zlibbioc_1.36.0 rlang_0.4.10
[13] rstudioapi_0.13 Matrix_1.2-18 preprocessCore_1.52.0 RCurl_1.98-1.2
[17] munsell_0.5.0 DelayedArray_0.16.0 compiler_4.0.3 pkgconfig_2.0.3
[21] pcaMethods_1.82.0 tidyselect_1.1.0 SummarizedExperiment_1.20.0 tibble_3.0.4
[25] GenomeInfoDbData_1.2.4 RANN_2.6.1 IRanges_2.24.1 codetools_0.2-16
[29] matrixStats_0.57.0 XML_3.99-0.5 crayon_1.3.4 dplyr_1.0.2
[33] MASS_7.3-53 bitops_1.0-6 grid_4.0.3 MassSpecWavelet_1.56.0
[37] gtable_0.3.0 lifecycle_0.2.0 affy_1.68.0 magrittr_2.0.1
[41] MsCoreUtils_1.2.0 scales_1.1.1 ncdf4_1.17 impute_1.64.0
[45] XVector_0.30.0 affyio_1.60.0 doParallel_1.0.16 limma_3.46.0
[49] robustbase_0.93-7 ellipsis_0.3.1 generics_0.1.0 vctrs_0.3.6
[53] RColorBrewer_1.1-2 iterators_1.0.13 glue_1.4.2 DEoptimR_1.0-8
[57] purrr_0.3.4 MatrixGenerics_1.2.0 colorspace_2.0-0 BiocManager_1.30.10
[61] vsn_3.58.0 GenomicRanges_1.42.0 MALDIquant_1.19.3

sneumann commented 3 years ago

Hi, thanks for reporting. Are you able to reproduce with either the faahKO or mtbls2 data packages ? That makes debugging much easier. Yours, Steffen

mwgard commented 3 years ago

I tried it out using the first datafile in the mtbls2 data package and receive the same error when fitgauss = T.

library(xcms)
library(data.table)
library(Risa)
library(mtbls2)

ISAmtbls2 <- readISAtab(find.package("mtbls2"))
a.filename <- ISAmtbls2["assay.filenames"][[1]]
msfiles <- getAssayRawDataFilenames(ISAmtbls2@assay.tabs[[1]], full.path = TRUE)[,1]
adf <- getAnnotatedDataFrameAssay(ISAmtbls2, assay.filename = a.filename)

xcms_raw <- MSnbase::readMSData(files = msfiles[1],
                                pdata = new("NAnnotatedDataFrame", pData(adf)[1,]),
                                mode = "onDisk")
# fitgauss = F works
xcms_fList <- xcms::findChromPeaks(
    xcms_raw, 
    param = xcms::CentWavePredIsoParam(ppm = 25, prefilter = c(3, 100),
                                       peakwidth = c(20, 50), maxCharge = 1,
                                       maxIso = 3, mzIntervalExtension = TRUE,
                                       polarity = 'positive', snthresh = 10,
                                       mzdiff = -0.001, fitgauss = F,
                                       noise = 0, verboseColumns = TRUE,
                                       firstBaselineCheck = TRUE),
    BPPARAM = SerialParam())

Detecting mass traces at 25 ppm ... OK Detecting chromatographic peaks in 2686 regions of interest ... OK: 1580 found. Detecting chromatographic peaks in 4196 regions of interest ... OK: 204 found.

# fitgauss = T fails
xcms_fList <- xcms::findChromPeaks(
xcms_raw, 
param = xcms::CentWavePredIsoParam(ppm = 25, prefilter = c(3, 100),
peakwidth = c(20, 50), maxCharge = 1,
maxIso = 3, mzIntervalExtension = TRUE,
polarity = 'positive', snthresh = 10,
mzdiff = -0.001, fitgauss = T,
noise = 0, verboseColumns = TRUE,
firstBaselineCheck = TRUE),
BPPARAM = SerialParam())

Detecting mass traces at 25 ppm ... OK Detecting chromatographic peaks in 2686 regions of interest ... OK: 1580 found. Error in do_findChromPeaks_addPredIsoROIs(mz = mz, int = int, scantime = scantime, : Error in scanrange

sneumann commented 3 years ago

Great. Ok, not great. But definitely reproducible then. Thanks, Yours, Steffen

jorainer commented 3 years ago

I'll look into it.

jorainer commented 3 years ago

So far I was able to track that error down to the C++ getEIC function. What puzzles me is how this can be related to fitgauss.

stanstrup commented 3 years ago

So far I was able to track that error down to the C++ getEIC function. What puzzles me is how this can be related to fitgauss.

It re-extracts the EIC to do the fit?

jorainer commented 3 years ago

actually, it re-extracts the EIC for the possible isotopes. I think the problem is that the scan indices somehow get messed up somewhere.

jorainer commented 3 years ago

I guess it has to do with the chrom peak boundaries that are different with fitgauss = TRUE and fitgauss = FALSE.

jorainer commented 3 years ago

I've fixed it now by avoiding to run the second centWave run on ROI with illegal boundaries (i.e. scan ranged). @mwgard you can install the version with the fix using:

Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
BiocManager::install("sneumann/xcms", ref = "RELEASE_3_12")

I'm closing the issue now. Feel free to re-open if the error persists

mwgard commented 3 years ago

Thanks!