rietho / IPO

A Tool for automated Optimization of XCMS Parameters
http://bioconductor.org/packages/IPO/
Other
34 stars 20 forks source link

Error with calcPPS function #50

Closed lauzikaite closed 6 years ago

lauzikaite commented 6 years ago

I'm trying to optimize xcmsSet parameters for a large study by taking subsets of the data (10 - 20 mzML files) as representatives of different LC-MS modes and running them separately. optimizeXcmsSet() does work with some of the data subsets just fine. But with some datasets, function stops after a few DoE with an error:

Error in peaks(xset)[, c("mz", "rt", "sample", "into", "mzmin", "mzmax", : subscript out of bounds Calls: system.time ... xcmsSetExperimentsCluster -> sapply -> lapply -> FUN -> calcPPS

I tried running the code with the same mzMLs on two systems:

R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: SUSE Linux Enterprise Server 12 SP1

Code below is for running on x86_64-pc-linux-gnu cluster:

ppparam <- getDefaultXcmsSetStartingParams('centWave')
ppparam$min_peakwidth <- c(1,5) 
ppparam$max_peakwidth <- c(5,10)
ppparam$ppm <-c(20,25) 
ppparam$noise <- c(300, 600) 
ppparam$mzdiff <- c(-0.001, 0.010) 
ppparam$prefilter <- 8  
ppparam$value_of_prefilter <- 6000 
ppparam$integrate <- 2 # 
ppparam$mzCenterFun <- 'wMean' 
ppparam$snthresh <- 10 

opp <- optimizeXcmsSet(files = files,
                         params = ppparam,
                         BPPARAM = MulticoreParam(workers = 10),
                         nSlaves = 1,
                         subdir = output_dir)

Have you encountered such issue before? I don't understand why this issue occurs, since function works with the same, but fewer files (e.g., 2 files, 5 files or 8 files out of 10 files). Furthermore, these mzML files are processed succesfully with xcmsSet() and are not corrupted in any way.

Thank you.

lauzikaite commented 6 years ago

I have just noticed that this error only occurs if files have no peaks detected with the specified parameters. Below is the output for optimizeXcmsSet() with 5 parameters to optimize for 15 files. First DoE was performed succesfully, but the second DoE crashed:

starting new DoE with: min_peakwidth: c(3, 7) max_peakwidth: c(7.5, 12.5) ppm: c(22.5, 27.5) mzdiff: c(-0.0065, 0.0045) noise: c(150, 450) snthresh: 10 prefilter: 8 value_of_prefilter: 6000 mzCenterFun: wMean integrate: 2 fitgauss: FALSE verbose.columns: FALSE

1

...

22 Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 345 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 374 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 383 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 389 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 376 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 363 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 399 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 390 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 381 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 369 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 389 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 382 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 348 regions of interest ... OK: 1 found. Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 434 regions of interest ... FAIL: none found! Detecting mass traces at 27.5 ppm ... OK Detecting chromatographic peaks in 427 regions of interest ... FAIL: none found! Error in peaks(xset)[, c("mz", "rt", "sample", "into", "mzmin", "mzmax", : subscript out of bounds Calls: system.time ... xcmsSetExperimentsCluster -> sapply -> lapply -> FUN -> calcPPS

rietho commented 6 years ago

Hi! Thank you for reporting your error. Sorry also for the late response, as I was out of office. Can you also please report your session sessionInfo(), so that I know the used package versions. Then I'll look into it. Can you also share the data with me or otherwise reproduce the error with public available data?

lauzikaite commented 6 years ago

Hi!

Thank you for your reply.

Data is located on Dropbox. Same error has been produced with multiple datasets from my study and I uploaded just ten QC samples here.

sessionInfo()

R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: SUSE Linux Enterprise Server 12 SP1

Matrix products: default BLAS: /apps/R/3.4.0/lib64/R/lib/libRblas.so LAPACK: /apps/R/3.4.0/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base

other attached packages: [1] IPO_1.2.0 CAMERA_1.32.0 rsm_2.8 [4] xcms_1.52.0 MSnbase_2.2.0 ProtGenerics_1.8.0 [7] mzR_2.10.0 Rcpp_0.12.11 BiocParallel_1.10.1 [10] Biobase_2.36.2 BiocGenerics_0.22.0

loaded via a namespace (and not attached): [1] vsn_3.44.0 splines_3.4.0 foreach_1.4.3 [4] Formula_1.2-1 affy_1.54.0 stats4_3.4.0 [7] latticeExtra_0.6-28 RBGL_1.52.0 impute_1.50.1 [10] backports_1.1.0 lattice_0.20-35 limma_3.32.2 [13] digest_0.6.12 RColorBrewer_1.1-2 checkmate_1.8.2 [16] colorspace_1.3-2 sandwich_2.3-4 htmltools_0.3.6 [19] preprocessCore_1.38.1 Matrix_1.2-10 plyr_1.8.4 [22] MALDIquant_1.16.2 XML_3.98-1.9 zlibbioc_1.22.0 [25] xtable_1.8-2 mvtnorm_1.0-6 scales_0.4.1 [28] RANN_2.5.1 affyio_1.46.0 lsmeans_2.26-3 [31] htmlTable_1.9 tibble_1.3.3 IRanges_2.10.2 [34] ggplot2_2.2.1 TH.data_1.0-8 nnet_7.3-12 [37] lazyeval_0.2.0 MassSpecWavelet_1.42.0 survival_2.41-3 [40] magrittr_1.5 estimability_1.2 doParallel_1.0.10 [43] nlme_3.1-131 MASS_7.3-47 foreign_0.8-69 [46] graph_1.54.0 BiocInstaller_1.26.0 tools_3.4.0 [49] data.table_1.10.4 multcomp_1.4-6 stringr_1.2.0 [52] S4Vectors_0.14.3 munsell_0.4.3 cluster_2.0.6 [55] pcaMethods_1.68.0 compiler_3.4.0 mzID_1.14.0 [58] rlang_0.1.1 grid_3.4.0 iterators_1.0.8 [61] htmlwidgets_0.8 igraph_1.0.1 base64enc_0.1-3 [64] gtable_0.2.0 codetools_0.2-15 multtest_2.32.0 [67] gridExtra_2.2.1 zoo_1.8-0 knitr_1.16 [70] Hmisc_4.0-3 stringi_1.1.5 rpart_4.1-11 [73] acepack_1.4.1 coda_0.19-1

I do use a modified plotContours() funtion to prevent jpeg generation, since the Linux cluster does not support graphical display. It shouldn't have any effect on the overall process though.

source("/.../plotContours_1.R")
tmpfun <- get("plotContours", envir = asNamespace("IPO"))
environment(plotContours_1) <- environment(tmpfun)
attributes(plotContours_1) <- attributes(tmpfun)
assignInNamespace("plotContours", plotContours_1, ns="IPO")

plotContours_1.R looks like that:

plotContours_1 <- function(model, maximum_slice, plot_name = NULL) {
  print('Modified plotContours') # To check whether v2 was re-assigned to the package
  plots <- c()
  for(i in 1:(length(maximum_slice)-1)) {
    for(j in (i+1):length(maximum_slice)) {
      plots <- c(plots, as.formula(paste("~ x", i, "* x", j, sep="")))
    } 
  }

  # determine number of plot rows and column on single device
  plot_rows <- round(sqrt(length(plots)))
  plot_cols <- 
    if(plot_rows==1){
      length(plots)
    } else {
      ceiling(sqrt(length(plots)))
    }

  # save as jpeg, if plot_name is given
  if(!is.null(plot_name)) {
    plot_name = paste(plot_name, ".jpg", sep="")
    # jpeg(plot_name, width=4*plot_cols, height=2*plot_rows+2, 
    #      units="in", res=c(200,200))
  } # otherwise plot on device

  # op <- par(mfrow = c(plot_rows, plot_cols), oma = c(0,0,0,0))  
  # contour.lm is called
  # contour(model, plots, image = TRUE, at = maximum_slice)

  if (!is.null(plot_name)) {
    # dev.off()
  }
  # par(op)
}
rietho commented 6 years ago

Thank you for the data and the sessionInfo().

The issue seems to arise in the rare case, when there is only one (or just few) peak(s) found. In this case the xcmsSet-object does not hold a sample-argument, which is expected by IPO. This might be a bug in the xcms-package, which I'm not sure yet. I'll have to have a deeper look into it to find out.

rietho commented 6 years ago

So the problem was in the xcms-package (see sneumann/xcms#220). An update to the newest xcms3 branch from github should resolve the problem (see also sneumann/xcms#220). However I'll add a workaround in IPO to also work with older xcms-versions.

lauzikaite commented 6 years ago

Thank you for looking into this, I will try running IPO with newest xcms3 branch.

rietho commented 6 years ago

According to my tests the develop-branch (see 2abc39edae7d7572e5369957a59dccd9715946ab) should now handle this issue correctly also for older xcms-versions.

lauzikaite commented 6 years ago

Brilliant, the develop-branch works just fine with the old xcms. xcms3 with the IPO Bioconductor-version works too.

Also, my suggestion for a new version is to provide an option to disable jpeg generation, which other users might find useful too :)

Thanks for your help!

rietho commented 6 years ago

Thank you very much for testing. The bug-fixes are on the way to Bioconductor as well.

Regarding the jpeg generateion: Are you aware of setting subdir = NULL to plot to the graphics device? Or are you asking for an option to completely disable plotting?

lauzikaite commented 6 years ago

Yes, an option to completely disable plotting would be useful for those that are using linux clusters, which do not support graphical display.

rietho commented 6 years ago

Thanks for the suggestion. I guess there's no reason against providing an option to disable plotting.