sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

findChromPeaks issue (bug?) #584

Open BaylorSci opened 2 years ago

BaylorSci commented 2 years ago

I recently updated to R version 4.1.1 and updated XCMS and associated packages to 3.14.1. Using Waters MSe data, I have previously successfully using findChromPeaks for both MS1 and MS2 using the add=TRUE. (script outlined below).

table(msLevel(raw_data1a)) 1 2 125773 104078

exp_data=findChromPeaks(raw_data1a, param=cwp, msLevel=1)

exp_data=findChromPeaks(raw_data1a, param=cwp, add=TRUE, msLevel=2)

However, when running the msLevel=2, the following error message is being returned: Error in .local(object, param, ...) : unused argument (add = TRUE)

I have also looked at just starting with msLevel=2L, which returns the following error: Error in h(simpleError(msg, call)) : error in evaluating the argument 'X' in selecting a method for function 'bplapply': No MS level 2 spectra present

Which makes me suspect that if there is a bug, it is potentially not with XCMS ?

I really should know better then to give in to updates!

sneumann commented 2 years ago

Hi, could you provide your SessionInfo() ? If you upgraded R, how did you upgrade the installed R packages, or did you install everything from fresh ? An inkling says there might be a package version mismatch in one of the dependencies. Yours, Steffen

sneumann commented 2 years ago

Oh, and one more thing would help: after the error occurs please use traceback() for some idea where exactly the error occurs. Yours, Steffen

BaylorSci commented 2 years ago

I installed everything fresh when I updated. SessionInfo outputs the following: R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets [7] methods base

other attached packages: [1] xcms_3.16.0 MSnbase_2.20.0
[3] ProtGenerics_1.26.0 mzR_2.28.0
[5] Rcpp_1.0.7 BiocParallel_1.28.0
[7] SummarizedExperiment_1.24.0 Biobase_2.54.0
[9] GenomicRanges_1.46.0 GenomeInfoDb_1.30.0
[11] IRanges_2.28.0 S4Vectors_0.32.0
[13] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
[15] matrixStats_0.61.0 dplyr_1.0.7

loaded via a namespace (and not attached): [1] vsn_3.62.0 foreach_1.5.1
[3] BiocManager_1.30.16 affy_1.72.0
[5] GenomeInfoDbData_1.2.7 robustbase_0.93-9
[7] impute_1.68.0 pillar_1.6.4
[9] lattice_0.20-45 glue_1.4.2
[11] limma_3.50.0 digest_0.6.28
[13] RColorBrewer_1.1-2 XVector_0.34.0
[15] colorspace_2.0-2 preprocessCore_1.56.0 [17] Matrix_1.3-4 plyr_1.8.6
[19] MALDIquant_1.20 XML_3.99-0.8
[21] pkgconfig_2.0.3 zlibbioc_1.40.0
[23] purrr_0.3.4 scales_1.1.1
[25] snow_0.4-4 RANN_2.6.1
[27] affyio_1.64.0 tibble_3.1.5
[29] generics_0.1.1 ggplot2_3.3.5
[31] ellipsis_0.3.2 MassSpecWavelet_1.60.0 [33] magrittr_2.0.1 crayon_1.4.1
[35] ncdf4_1.17 fansi_0.5.0
[37] doParallel_1.0.16 MASS_7.3-54
[39] MsFeatures_1.2.0 tools_4.1.1
[41] lifecycle_1.0.1 munsell_0.5.0
[43] cluster_2.1.2 DelayedArray_0.20.0
[45] pcaMethods_1.86.0 compiler_4.1.1
[47] mzID_1.32.0 rlang_0.4.11
[49] grid_4.1.1 RCurl_1.98-1.5
[51] iterators_1.0.13 MsCoreUtils_1.6.0
[53] bitops_1.0-7 gtable_0.3.0
[55] codetools_0.2-18 DBI_1.1.1
[57] R6_2.5.1 utf8_1.2.2
[59] clue_0.3-60 parallel_4.1.1
[61] vctrs_0.3.8 DEoptimR_1.0-9
[63] tidyselect_1.1.1

When I run the following script: exp_data=findChromPeaks(raw_data1a, param=cwp, msLevel=2, add=TRUE) It returns: Error in .local(object, param, ...) : unused argument (add = TRUE)

and then run traceback(), it returns the following : No traceback available

I'm a bit stumped.

sneumann commented 2 years ago

Hm, can't confirm yet here. We have an add = TRUE in the tests https://github.com/sneumann/xcms/blob/RELEASE_3_14/tests/testthat/test_methods-XCMSnExp.R#L1272 and that happily works here with 3.16.0: http://bioconductor.org/checkResults/release/bioc-LATEST/xcms/

Could you run R CMD check on the package http://bioconductor.org/packages/release/bioc/html/xcms.html ? Yours, Steffen

BaylorSci commented 2 years ago

Hi Steffen So i ran R CMD check on the xcms package; and it returned the following:

Warning: S3 methods ‘plot.xcmsEIC’, ‘split.xcmsSet’, ‘c.xcmsSet’, ‘c.XCMSnExp’, ‘split.xcmsRaw’ were declared in NAMESPACE but not found Warning in setup_ns_exports(path, export_all, export_imports) : Objects listed as exports, but not present in namespace: etg, medianFilter, plotQC, retexp, specNoise, specPeaks, SSgauss, msn2xcmsRaw, verify.mzQuantML, xcmsRaw, xcmsSet, xcmsFragments, phenoDataFromPaths, binYonX, breaks_on_binSize, breaks_on_nBins, do_findChromPeaks_centWave, do_findChromPeaks_massifquant, do_findChromPeaks_matchedFilter, do_findPeaks_MSW, do_findChromPeaks_centWaveWithPredIsoROIs, do_findChromPeaks_addPredIsoROIs, imputeLinInterpol, useOriginalCode, do_groupChromPeaks_density, do_groupPeaks_mzClust, do_groupChromPeaks_nearest, do_adjustRtime_peakGroups, processHistoryTypes, adjustRtimePeakGroups, plotAdjustedRtime, highlightChromPeaks, plotChromPeaks, plotChromPeakImage, isCalibrated, plotMsData, applyAdjustedRtime, filterFeatureDefinitions, peaksWithMatchedFilter, peaksWithCentWave, rla, rowRla, featureSummary, overlappingFeatures, fixedMz, fixedRt, exportMetaboAnalyst, imputeRowMin, imputeRowMinRand, chromPeakSpectra, featureSpectra, featureChromatograms, hasFill [... truncated] Warning in normalizePath(path.expand(path), winslash, mustWork) :

I downloaded the package via Bioconductor using the typical biocmanager::install("xcms), and this has always worked previously.

BaylorSci commented 2 years ago

More verbose output: -- R CMD build ---------------------------------------------------------------------------- √ checking for file '...\R\win-library\4.1\xcms/DESCRIPTION' ...

sneumann commented 2 years ago

Can you do an R CMD check xcms_3.16.0.tar.gz ? The above output did not run yet the tests covering the add = TRUE thing we need. The warnings are not dramatic, that is unfortunately a bit noisy. Yours, Steffen

BaylorSci commented 2 years ago

-- R CMD check ----------------------------------------------------------------------------

-- R CMD check results --------------------------------------------------- xcms 3.16.0 ---- Duration: 12m 37s

checking for hidden files and directories ... NOTE Found the following hidden files and directories: .BBSoptions These were most likely included in error. See section 'Package structure' in the 'Writing R Extensions' manual.

checking installed package size ... NOTE installed size is 15.0Mb sub-directories of 1Mb or more: R 3.1Mb doc 9.5Mb libs 1.0Mb

checking DESCRIPTION meta-information ... NOTE License components with restrictions not permitted: GPL (>= 2) + file LICENSE

checking dependencies in R code ... NOTE Unexported objects imported by ':::' calls: 'MALDIquant:::.localMaxima' 'MSnbase:::.MSnExpReqFvarLabels' 'MSnbase:::.plotXIC' 'MSnbase:::.vertical_sub_layout' 'MSnbase:::formatFileSpectrumNames' See the note in ?::: about the use of this operator. There are ::: calls to the package's namespace in its code. A package almost never needs to use ::: for its own objects: '.copy_env' '.getChromPeakData' '.get_closest_index' '.spectra_for_peaks' '.split_by_file2' '.validChromPeaksMatrix' 'MSW.cwt' 'MSW.getLocalMaximumCWT' 'MSW.getRidge' 'descendMin' 'descendMinTol' 'estimateChromNoise' 'getLocalNoiseEstimate' 'na.flatfill' 'patternVsRowScore'

checking R code for possible problems ... NOTE .xcmsFragments.plotTree: no visible global function definition for 'edgemode<-' .xcmsFragments.plotTree: no visible global function definition for 'addEdge' buildAnalysisSummary: no visible global function definition for 'newXMLNode' buildAssayList : : no visible global function definition for 'newXMLNode' buildAssayList: no visible global function definition for 'newXMLNode' buildAuditCollection: no visible global function definition for 'newXMLNode' buildCVlist: no visible global function definition for 'newXMLNode' buildCVlist: no visible global function definition for 'addChildren' buildCvParams : : no visible global function definition for 'newXMLNode' buildDataProcessingList: no visible global function definition for 'newXMLNode' buildFeatureList : : no visible global function definition for 'newXMLNode' buildInputFiles : : no visible global function definition for 'newXMLNode' buildInputFiles: no visible global function definition for 'newXMLNode' buildMzq: no visible global function definition for 'xmlTree' buildSmallMoleculeList : : no visible global function definition for 'newXMLNode' buildSmallMoleculeList: no visible global function definition for 'newXMLNode' buildSoftwareList: no visible global function definition for 'newXMLNode' buildStudyVariableList : : no visible global function definition for 'newXMLNode' buildStudyVariableList : : : no visible global function definition for 'newXMLNode' buildStudyVariableList: no visible global function definition for 'newXMLNode' chromPeakSpectra: no visible global function definition for 'List' featureSpectra: no visible global function definition for 'List' plotQC: no visible global function definition for 'sampleNames' running: multiple local function definitions for 'funct' with different formal arguments verify.mzQuantML: no visible global function definition for 'xmlTreeParse' verify.mzQuantML: no visible global function definition for 'xmlInternalTreeParse' verify.mzQuantML: no visible global function definition for 'xmlSchemaValidate' xcmsClusterApply: no visible global function definition for 'checkCluster' xcmsClusterApply : submit: no visible global function definition for 'sendCall' xcmsClusterApply: no visible global function definition for 'recvOneResult' xcmsClusterApply: no visible global function definition for 'checkForRemoteErrors' xcmsPapply: no visible global function definition for 'mpi.comm.size' xcmsPapply: no visible global function definition for 'mpi.spawn.Rslaves' xcmsPapply: no visible global function definition for 'mpi.comm.rank' xcmsPapply : papply_int_slavefunction: no visible global function definition for 'mpi.send.Robj' xcmsPapply : papply_int_slavefunction: no visible global function definition for 'mpi.recv.Robj' xcmsPapply : papply_int_slavefunction: no visible global function definition for 'mpi.any.source' xcmsPapply : papply_int_slavefunction: no visible global function definition for 'mpi.any.tag' xcmsPapply : papply_int_slavefunction: no visible global function definition for 'mpi.get.sourcetag' xcmsPapply: no visible global function definition for 'mpi.bcast.Robj2slave' xcmsPapply: no visible global function definition for 'mpi.bcast.cmd' xcmsPapply: no visible global function definition for 'mpi.recv.Robj' xcmsPapply: no visible global function definition for 'mpi.any.source' xcmsPapply: no visible global function definition for 'mpi.any.tag' xcmsPapply: no visible global function definition for 'mpi.get.sourcetag' xcmsPapply: no visible global function definition for 'mpi.send.Robj' xcmsParallelSetup: no visible global function definition for 'mpi.spawn.Rslaves' xcmsParallelSetup: no visible global function definition for 'mpi.comm.size' xcmsParallelSetup: no visible global function definition for 'mpi.comm.rank' xcmsParallelSetup: no visible global function definition for 'makeCluster' [,XChromatograms-ANY-ANY-ANY: no visible global function definition for 'pData<-' plotSurf,xcmsRaw: no visible global function definition for 'rgl.clear' plotSurf,xcmsRaw: no visible global function definition for 'rgl.surface' plotSurf,xcmsRaw: no visible global function definition for 'rgl.points' plotSurf,xcmsRaw: no visible global function definition for 'rgl.bbox' plotTree,xcmsFragments: no visible global function definition for 'edgemode<-' plotTree,xcmsFragments: no visible global function definition for 'addEdge' refineChromPeaks,XCMSnExp-FilterIntensityParam: no visible binding for global variable 'value' write.cdf,xcmsRaw: no visible global function definition for 'ncdim_def' write.cdf,xcmsRaw: no visible global function definition for 'ncvar_def' write.cdf,xcmsRaw: no visible global function definition for 'nc_create' write.cdf,xcmsRaw: no visible global function definition for 'ncvar_put' write.cdf,xcmsRaw: no visible global function definition for 'ncatt_put' write.cdf,xcmsRaw: no visible global function definition for 'nc_close' write.mzQuantML,xcmsSet: no visible global function definition for 'saveXML' write.mzdata,xcmsRaw: no visible global function definition for 'base64encode' Undefined global functions or variables: List addChildren addEdge base64encode checkCluster checkForRemoteErrors edgemode<- makeCluster mpi.any.source mpi.any.tag mpi.bcast.Robj2slave mpi.bcast.cmd mpi.comm.rank mpi.comm.size mpi.get.sourcetag mpi.recv.Robj mpi.send.Robj mpi.spawn.Rslaves nc_close nc_create ncatt_put ncdim_def ncvar_def ncvar_put newXMLNode pData<- recvOneResult rgl.bbox rgl.clear rgl.points rgl.surface sampleNames saveXML sendCall value xmlInternalTreeParse xmlSchemaValidate xmlTree xmlTreeParse

checking compiled code ... NOTE Note: information on .o files for x64 is not available Warning in read_symbols_from_dll(so, rarch) : this requires 'objdump.exe' to be on the PATH Warning in read_symbols_from_dll(so, rarch) : this requires 'objdump.exe' to be on the PATH

See 'Writing portable packages' in the 'Writing R Extensions' manual.

0 errors √ | 0 warnings √ | 6 notes x

sneumann commented 2 years ago

Great, exactly what we needed. So in general everything works because √ Running 'testthat.R' (4m 23.4s). Next your script needs debugging, can you post a reproducible example that triggers the error, using available data from e.g. the faahKO or mtbls2 packages ?

BaylorSci commented 2 years ago

Just triple checking, but aren't both of the suggested data packages consisting of only msLevel 1, when the error I am reporting is for the addition of msLevel 2 with add=TRUE?

sneumann commented 2 years ago

You're correct. Can you reproduce with this one: msnfile <- system.file("microtofq/MSMSpos20_6.mzML", package = "msdata") or any other file of your choice we can download somewhere. Yours, Steffen

BaylorSci commented 2 years ago

Certainly. Using the file you suggested, I choose the default parameters, with the following scripts:

cwp=CentWaveParam()

files= system.file("microtofq/MSMSpos20_6.mzML", package = "msdata")

pd1=data.frame(sample_name=sub(basename(files), pattern=".mzML", replacement = "", fixed=TRUE), sample_group=c(rep("Check", 1)), stringAsFactors=FALSE)

raw_data1a=readMSData(files=files, pdata=new("NAnnotatedDataFrame", pd1), mode="onDisk")

exp_data=findChromPeaks(raw_data1a, param=cwp, msLevel=1)

exp_data=findChromPeaks(raw_data1a, param=cwp, add=TRUE, msLevel=2)

returns: Error in .local(object, param, ...) : unused argument (add = TRUE) traceback: > traceback() 2: findChromPeaks(raw_data1a, param = cwp, add = TRUE, msLevel = 2) 1: findChromPeaks(raw_data1a, param = cwp, add = TRUE, msLevel = 2)

Switching this around, I can pick peaks for msLevel=2 first, and the when I try to add ms 1 data to the onDiskMSnExp, the add=TRUE error comes up.

Outlined below is the session info: R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] beepr_1.3 xcms_3.16.0
[3] MSnbase_2.20.0 ProtGenerics_1.26.0
[5] mzR_2.28.0 Rcpp_1.0.7
[7] BiocParallel_1.28.0 SummarizedExperiment_1.24.0 [9] Biobase_2.54.0 GenomicRanges_1.46.0
[11] GenomeInfoDb_1.30.0 IRanges_2.28.0
[13] S4Vectors_0.32.0 BiocGenerics_0.40.0
[15] MatrixGenerics_1.6.0 matrixStats_0.61.0
[17] dplyr_1.0.7

loaded via a namespace (and not attached): [1] vsn_3.62.0 foreach_1.5.1 BiocManager_1.30.16
[4] affy_1.72.0 GenomeInfoDbData_1.2.7 robustbase_0.93-9
[7] impute_1.68.0 pillar_1.6.4 lattice_0.20-45
[10] glue_1.4.2 limma_3.50.0 digest_0.6.28
[13] RColorBrewer_1.1-2 XVector_0.34.0 colorspace_2.0-2
[16] preprocessCore_1.56.0 Matrix_1.3-4 plyr_1.8.6
[19] MALDIquant_1.20 XML_3.99-0.8 pkgconfig_2.0.3
[22] zlibbioc_1.40.0 purrr_0.3.4 scales_1.1.1
[25] snow_0.4-4 RANN_2.6.1 affyio_1.64.0
[28] tibble_3.1.5 generics_0.1.1 ggplot2_3.3.5
[31] ellipsis_0.3.2 MassSpecWavelet_1.60.0 magrittr_2.0.1
[34] crayon_1.4.2 ncdf4_1.17 fansi_0.5.0
[37] doParallel_1.0.16 MASS_7.3-54 MsFeatures_1.2.0
[40] tools_4.1.1 lifecycle_1.0.1 stringr_1.4.0
[43] munsell_0.5.0 cluster_2.1.2 DelayedArray_0.20.0
[46] pcaMethods_1.86.0 compiler_4.1.1 mzID_1.32.0
[49] rlang_0.4.12 grid_4.1.1 RCurl_1.98-1.5
[52] iterators_1.0.13 MsCoreUtils_1.6.0 bitops_1.0-7
[55] gtable_0.3.0 codetools_0.2-18 DBI_1.1.1
[58] R6_2.5.1 utf8_1.2.2 clue_0.3-60
[61] stringi_1.7.5 parallel_4.1.1 vctrs_0.3.8
[64] audio_0.1-8 DEoptimR_1.0-9 tidyselect_1.1.1

jorainer commented 2 years ago

OK, I'll have a look into that.

jorainer commented 2 years ago

Note that you are calling findChromPeaks twice on the same object, i.e. the OnDiskMSnExp object with the raw data. The error you are getting (which is indeed very cryptic) comes from the findChromPeaks,OnDiskMSnExp method that does not have a parameter add, while the findChromPeaks,XCMSnExp actually has.

To fix your problem I would suggest to change your workflow to call findChromPeaks with the parameter add = TRUE on the result object you got from the first findChromPeaks call. Something along:

data <- readMSData(fls)

xdata <- findChromPeaks(data, param = cwp)
xdata <- findChromPeaks(xdata, param = cwp, msLevel = 2L, add = TRUE)

Note that in the second findChromPeaks call was done on xdata (an XCMSnExp object with results from the first chromatographic peak detection run), not data (an OnDiskMSnExp object that just represents the raw MS data, without chrom peak results).

BaylorSci commented 2 years ago

ok, that makes sense. I ran this successfully with the suggested "mzml" file, and then when I turned to run this with my own files, exp_data=findChromPeaks(raw_data1a, param=cwp, msLevel=1) followed by: exp_data=findChromPeaks(exp_data, param=cwp, msLevel=2L, add=TRUE) and it returns a new error message: Error in h(simpleError(msg, call)) : error in evaluating the argument 'X' in selecting a method for function 'bplapply': No MS level 2 spectra present.

This is odd, since the initially imported onDiskMsnExp has both 1 and 2 levels.

The traceback is as follows: 13: h(simpleError(msg, call)) 12: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "No MS level 2 spectra present.", base::quote(NULL)) 11: stop("No MS level ", msLevel., " spectra present.", call. = FALSE) 10: (function (i, fd, x, to_class) { a <- new(to_class) slot(procd, "files", check = FALSE) <- x@processingData@files[i] slot(a, "processingData", check = FALSE) <- procd slot(a, "featureData", check = FALSE) <- extractROWS(fd, which(fd$msLevel %in% msLevel.)) if (!nrow(a@featureData)) stop("No MS level ", msLevel., " spectra present.", call. = FALSE) a@featureData$fileIdx <- 1L slot(a, "experimentData", check = FALSE) <- expd slot(a, "spectraProcessingQueue", check = FALSE) <- x@spectraProcessingQueue slot(a, "phenoData", check = FALSE) <- x@phenoData[i, , drop = FALSE] a })(dots[[1L]][[6L]], dots[[2L]][[6L]], x = new("XCMSnExp", .processHistory = list( new("XProcessHistory", param = new("CentWaveParam", ppm = 25, peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L, mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE, roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric(0), ... 9: mapply(seq_along(fileNames(x)), fdl, FUN = create_object, MoreArgs = list(x = x, to_class = to_class)) 8: .split_by_file2(object, msLevel. = msLevel) 7: bplapply(.split_by_file2(object, msLevel. = msLevel), FUN = findChromPeaks_OnDiskMSnExp, method = "centWave", param = param, BPPARAM = BPPARAM) 6: .local(object, param, ...) 5: (new("MethodDefinition", .Data = function (object, param, ...) { .local <- function (object, param, BPPARAM = bpparam(), return.type = "XCMSnExp", msLevel = 1L) { return.type <- match.arg(return.type, c("XCMSnExp", "list", "xcmsSet")) startDate <- date() if (length(msLevel) > 1) stop("Currently only peak detection in a single MS level is ", "supported", call. = FALSE) centroided <- all(centroided(object)[msLevel(object) %in% msLevel]) if (is.na(centroided)) { idx <- which(msLevel(object) %in% msLevel) idx <- idx[ceiling(length(idx)/3)] suppressWarnings(centroided <- isCentroided(object[[idx]])) } if (is.na(centroided) || !centroided) warning("Your data appears to be not centroided! CentWave", ... 4: do.call(meth, args = list(object = object, param = param, BPPARAM = BPPARAM, return.type = return.type, msLevel = msLevel)) 3: .local(object, param, ...) 2: findChromPeaks(exp_data, param = cwp, msLevel = 2L, add = TRUE) 1: findChromPeaks(exp_data, param = cwp, msLevel = 2L, add = TRUE)

I know the cwp parameters are not optimized, but the error message is throwing me a bit by saying no MS level 2 spectra present.

jorainer commented 2 years ago

This is indeed strange. What is the output of

table(msLevel(raw_data1a))

on your data? Could it be that you have some files in your data set with only MS1 data?