sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
185 stars 80 forks source link

Bioparallel errors in fillChromPeaks #267

Closed lee-t closed 6 years ago

lee-t commented 6 years ago

Hi, I'm trying to debug a script that processes lipid samples. After peakpicking, 2 rounds of grouping and RT, the fillChrom peak step fails on a Ubuntu machine running MulticoreParam.

x_filled <- fillChromPeaks(x_2density, BPPARAM =MulticoreParam() )
Requesting 25528 missing peaks from QE005389.mzXML ... 
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: result would be too long a vector
In addition: Warning message:
stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit 

Running it in serialparam also seems to fail.

x_filled <- fillChromPeaks(x_2density, BPPARAM = SerialParam())
Requesting 19871 missing peaks from QE005376.mzXML ... 
Error in 1:(scanrange[1] - 1) : result would be too long a vector

This seems relatively inconsistent since it does work on a Windows PC with SnowParam()

jorainer commented 6 years ago

The error message points towards a memory problem. Could you please provide the output of sessionInfo on Windows and Linux to check what versions of R, xcms, mzR and MSnbase you are using? Also, what's the size of the memory you have available on Windows and on Linux?

lee-t commented 6 years ago

Windows: 192GB physical memory

> memory.size()
[1] 3414.02
> gc()
            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   6610205  353.1   14442815  771.4  14442815  771.4
Vcells 180772895 1379.2  520945268 3974.5 565049686 4311.0
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] snow_0.4-2          rsm_2.9             CAMERA_1.34.0       xcms_3.0.0         
 [5] MSnbase_2.4.2       ProtGenerics_1.10.0 mzR_2.12.0          Rcpp_0.12.15       
 [9] BiocParallel_1.12.0 Biobase_2.38.0      BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] lattice_0.20-35        digest_0.6.15          foreach_1.4.4          plyr_1.8.4            
 [5] backports_1.1.2        acepack_1.4.1          mzID_1.16.0            stats4_3.4.3          
 [9] ggplot2_2.2.1          BiocInstaller_1.28.0   pillar_1.2.1           zlibbioc_1.24.0       
[13] rlang_0.2.0            lazyeval_0.2.1         rstudioapi_0.7         data.table_1.10.4-3   
[17] S4Vectors_0.16.0       rpart_4.1-13           Matrix_1.2-12          checkmate_1.8.5       
[21] preprocessCore_1.40.0  splines_3.4.3          stringr_1.3.0          foreign_0.8-69        
[25] htmlwidgets_1.0        igraph_1.2.1           munsell_0.4.3          compiler_3.4.3        
[29] pkgconfig_2.0.1        base64enc_0.1-3        multtest_2.34.0        pcaMethods_1.70.0     
[33] htmltools_0.3.6        nnet_7.3-12            tibble_1.4.2           gridExtra_2.3         
[37] htmlTable_1.11.2       RANN_2.5.1             Hmisc_4.1-1            IRanges_2.12.0        
[41] codetools_0.2-15       XML_3.98-1.10          MASS_7.3-49            grid_3.4.3            
[45] MassSpecWavelet_1.44.0 RBGL_1.54.0            gtable_0.2.0           affy_1.56.0           
[49] magrittr_1.5           scales_0.5.0           graph_1.56.0           stringi_1.1.6         
[53] impute_1.52.0          affyio_1.48.0          doParallel_1.0.11      limma_3.34.9          
[57] latticeExtra_0.6-28    Formula_1.2-2          RColorBrewer_1.1-2     iterators_1.0.9       
[61] LOBSTAHS_1.4.0         survival_2.41-3        colorspace_1.3-2       cluster_2.0.6         
[65] vsn_3.46.0             MALDIquant_1.17        knitr_1.20   

Linux: 64GB physical memory

> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  4097561 218.9    6861544 366.5  6861544 366.5
Vcells 14170591 108.2   32215028 245.8 54292152 414.3
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] shiny_1.0.5         snow_0.4-2          rsm_2.9             CAMERA_1.34.0       xcms_3.0.0          MSnbase_2.4.1      
 [7] ProtGenerics_1.10.0 mzR_2.12.0          Rcpp_0.12.14        BiocParallel_1.12.0 Biobase_2.38.0      BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] vsn_3.46.0             tidyr_0.7.2            splines_3.4.3          foreach_1.4.4          Formula_1.2-2         
 [6] assertthat_0.2.0       affy_1.56.0            stats4_3.4.3           latticeExtra_0.6-28    RBGL_1.54.0           
[11] impute_1.52.0          backports_1.1.2        lattice_0.20-35        glue_1.2.0             limma_3.34.4          
[16] digest_0.6.13          RColorBrewer_1.1-2     checkmate_1.8.5        colorspace_1.3-2       httpuv_1.3.5          
[21] htmltools_0.3.6        preprocessCore_1.40.0  Matrix_1.2-11          plyr_1.8.4             MALDIquant_1.17       
[26] XML_3.98-1.9           pkgconfig_2.0.1        zlibbioc_1.24.0        xtable_1.8-2           purrr_0.2.4           
[31] scales_0.5.0           RANN_2.5.1             affyio_1.48.0          htmlTable_1.11.0       tibble_1.3.4          
[36] IRanges_2.12.0         ggplot2_2.2.1          nnet_7.3-12            lazyeval_0.2.1         MassSpecWavelet_1.44.0
[41] mime_0.5               survival_2.41-3        magrittr_1.5           doParallel_1.0.11      MASS_7.3-49           
[46] foreign_0.8-69         graph_1.56.0           BiocInstaller_1.28.0   data.table_1.10.4-3    stringr_1.2.0         
[51] S4Vectors_0.16.0       munsell_0.4.3          cluster_2.0.6          bindrcpp_0.2           pcaMethods_1.70.0     
[56] compiler_3.4.3         mzID_1.16.0            rlang_0.1.4            grid_3.4.3             iterators_1.0.9       
[61] rstudioapi_0.7         htmlwidgets_0.9        igraph_1.1.2           base64enc_0.1-3        gtable_0.2.0          
[66] codetools_0.2-15       multtest_2.34.0        R6_2.2.2               gridExtra_2.3          knitr_1.17            
[71] dplyr_0.7.4            bindr_0.1              Hmisc_4.1-0            stringi_1.1.6          rpart_4.1-13          
[76] acepack_1.4.1         
jorainer commented 6 years ago

Could you please update the xcms and MSnbase packages on both linux and windows? We recently fixed a memory problem in xcms - could well be that this fixes also your problem.

emmagraham commented 6 years ago

Hi, I have the exact same error. I am running XCMS on mzML files centroided using ProteoWizard's msconvert.

This is my code:

raw_data <- readMSData(files = our_files, pdata = new("NAnnotatedDataFrame", meta_data),
                       mode = "onDisk") 
cwp <- CentWaveParam(peakwidth = c(30, 80), noise = 1000)
xdata <- findChromPeaks(raw_data, param = cwp)
xdata <- adjustRtime(xdata, param = ObiwarpParam(gapInit = 2.86,
                                                 gapExtend = 2.268))
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.1, 
                        bw = 0.25,
                        minSamples = 1)
xdata <- groupChromPeaks(xdata, param = pdp)
xdata <- fillChromPeaks(xdata)

This is the error I get:

Requesting 3983 missing peaks from QT_170404_43.mzML ... got 3945.
Requesting 4020 missing peaks from QT_170404_46.mzML ... 
Error: BiocParallel errors
  element index: 2, 3, 4, 5
  first error: result would be too long a vector
In addition: Warning message:
stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit 

I get the same error when explicitly calling SnowParam() and MulticoreParam(). When using serialParam(), I get the following error:

Requesting 3952 missing peaks from QT_170404_15.mzML ... got 3893.
Requesting 4028 missing peaks from QT_170404_16.mzML ... got 3968.
Requesting 4001 missing peaks from QT_170404_17.mzML ... Error in 1:(scanrange[1] - 1) : result would be too long a vector
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xcms_3.2.0           MSnbase_2.6.0        ProtGenerics_1.12.0  mzR_2.14.0           Rcpp_0.12.16        
 [6] BiocParallel_1.14.1  Biobase_2.40.0       BiocGenerics_0.26.0  reshape2_1.4.3       XML_3.98-1.11       
[11] BiocInstaller_1.30.0 norm_1.0-9.5        

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2     compiler_3.5.0         pillar_1.2.2           plyr_1.8.4             iterators_1.0.9       
 [6] tools_3.5.0            zlibbioc_1.26.0        digest_0.6.15          MALDIquant_1.17        tibble_1.4.2          
[11] preprocessCore_1.42.0  gtable_0.2.0           lattice_0.20-35        rlang_0.2.0            Matrix_1.2-14         
[16] foreach_1.4.4          yaml_2.1.19            stringr_1.3.0          IRanges_2.14.6         S4Vectors_0.18.1      
[21] multtest_2.36.0        stats4_3.5.0           grid_3.5.0             impute_1.54.0          survival_2.42-3       
[26] RANN_2.5.1             limma_3.36.1           ggplot2_2.2.1          magrittr_1.5           splines_3.5.0         
[31] scales_0.5.0           pcaMethods_1.72.0      codetools_0.2-15       MASS_7.3-50            MassSpecWavelet_1.46.0
[36] mzID_1.18.0            colorspace_1.3-2       stringi_1.2.2          affy_1.58.0            doParallel_1.0.11     
[41] lazyeval_0.2.1         munsell_0.4.3          vsn_3.48.0             affyio_1.50.0 
gc()
           used  (Mb) gc trigger  (Mb) limit (Mb)  max used  (Mb)
Ncells  4948588 264.3    8881278 474.4         NA   8881278 474.4
Vcells 35158500 268.3  106410195 811.9      32768 106409897 811.9

Could it be a memory problem? I don't have easy access in to a Windows PC, unfortunately, so I would love to figure out a fix for OSX.

jorainer commented 6 years ago

Thank you @emmagraham for your detailed error description! I will have a look at it.

jorainer commented 6 years ago

@emmagraham , could you please do some tests for me? 1) before you do fillChromPeaks, could you please run any(is.na(chromPeaks(xdata))) and post the result? 2) could you please install the latest xcms (using devtools::install_github("sneumann/xcms", ref = "master") and test with that? This will hopefully help narrowing down from where the error comes.

emmagraham commented 6 years ago

@jotsetung Thanks for your help! I installed the latest version of XCMS from your repo, as you suggested. Here is my session info:

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xcms_3.3.1           devtools_1.13.5      MSnbase_2.6.0        ProtGenerics_1.12.0  mzR_2.14.0          
 [6] Rcpp_0.12.16         BiocParallel_1.14.1  Biobase_2.40.0       BiocGenerics_0.26.0  reshape2_1.4.3      
[11] XML_3.98-1.11        BiocInstaller_1.30.0 norm_1.0-9.5        

loaded via a namespace (and not attached):
 [1] splines_3.5.0          lattice_0.20-35        colorspace_1.3-2       snow_0.4-2             stats4_3.5.0          
 [6] yaml_2.1.19            vsn_3.48.0             survival_2.42-3        rlang_0.2.0            pillar_1.2.2          
[11] withr_2.1.2            affy_1.58.0            RColorBrewer_1.1-2     affyio_1.50.0          foreach_1.4.4         
[16] plyr_1.8.4             mzID_1.18.0            stringr_1.3.0          zlibbioc_1.26.0        munsell_0.4.3         
[21] pcaMethods_1.72.0      gtable_0.2.0           codetools_0.2-15       memoise_1.1.0          knitr_1.20            
[26] IRanges_2.14.6         doParallel_1.0.11      curl_3.2               MassSpecWavelet_1.46.0 preprocessCore_1.42.0 
[31] scales_0.5.0           limma_3.36.1           S4Vectors_0.18.1       RANN_2.5.1             impute_1.54.0         
[36] ggplot2_2.2.1          digest_0.6.15          stringi_1.2.2          grid_3.5.0             tools_3.5.0           
[41] magrittr_1.5           lazyeval_0.2.1         tibble_1.4.2           MASS_7.3-50            Matrix_1.2-14         
[46] httr_1.3.1             iterators_1.0.9        R6_2.2.2               MALDIquant_1.17        multtest_2.36.0       
[51] compiler_3.5.0         git2r_0.21.0      

test 1

any(is.na(chromPeaks(xdata)))
[1] FALSE
emmagraham commented 6 years ago

I also tried using the fillChromPeaks function, and got the following error:

xdata <- fillChromPeaks(xdata, BPPARAM = p)
Requesting 3983 missing peaks from QT_170404_43.mzML ... got 3945.
Requesting 4020 missing peaks from QT_170404_46.mzML ... 
Error: BiocParallel errors
  element index: 2, 3, 4, 5
  first error: 'scanrange' does not contain finite values
In addition: Warning message:
stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit 
jorainer commented 6 years ago

Thanks, now we're getting closer. I think the problem is that you have peaks with retention times that are outside the retention time range for certain files. I think I fixed this now. Could you please install xcms again (from github) and retry?

emmagraham commented 6 years ago

Sorry for the delay - with the latest version of XCMS from github, I still get the exact same error.

So, I tried swapping out my input mzML data for centroided mzData files from another experiment (with exact same parameters, code etc), and everything ran without errors.

This made me suspicious that something may be wrong with my input files, so I ran my code for XCMS v1.48.0 (provided below) on the mzML files I am currently trying to analyze, and was able to run XCMS without errors. My code for v1.48.0:

xset <- xcmsSet(our_files,
                                method = "centWave",
                                ppm = 15,
                                peakwidth = c(3,35.75),
                                mzdiff = 0.00325,
                                prefilter = c(3, 100),
                                noise = 0,
                                snthresh = 2.8)

#group peaks together
xset <- group(xset)
#retention time correction
xset2 <- retcor(xset,
                                method = "obiwarp",
                                profStep = 1,
                                gapInit = 2.86,
                                gapExtend = 2.268)
#group again
xset2 <- group(xset2,
                             bw = 0.25,
                             mzwid = 0.02122,
                             minfrac = 0.1,
                             minsamp = 1)
xset3 <- fillPeaks(xset2)
gt <- xcms::groups(xset3)
intensity_matrix <- groupval(xset3, "medret", "into")

However, I do get more than 50 warnings, all about specific features being out of RT range. An example of a warning:

In .local(object, ...) :
  getPeaks: Peak  m/z:107.068016052246-107.069328308105,  RT:2.5255-12.37250000002is out of retention time range for this sample (/Users/emmagraham/Desktop/Masters/Metabolomics project/metabolomics_repo/Controls/Files_pos/QT_170328_15.mzML), using zero intensity value.

This provides additional evidence that the RT range is where the current version is getting tripped up.

Session info:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xcms_1.48.0         Biobase_2.32.0      ProtGenerics_1.4.0  BiocGenerics_0.18.0
[5] mzR_2.6.3           Rcpp_0.12.16        readr_1.1.1         XML_3.98-1.10      

loaded via a namespace (and not attached):
 [1] bindr_0.1.1        magrittr_1.5       hms_0.4.2          lattice_0.20-35    R6_2.2.2          
 [6] rlang_0.2.0        dplyr_0.7.4        tools_3.4.4        grid_3.4.4         yaml_2.1.18       
[11] assertthat_0.2.0   tibble_1.4.2       bindrcpp_0.2.2     RColorBrewer_1.1-2 codetools_0.2-15  
[16] glue_1.2.0         compiler_3.4.4     pillar_1.2.1       pkgconfig_2.0.1 

Some info about my current files that I may have not mentioned earlier: they were converted from Agilent .d files to mzML files using msconvert. Through msconvert, the files were centroided (using peak_picking = TRUE argument) and compressed using zlib. Have there been any changes in how the new version of XCMS deals with RT being out of the scan range in mzML files?

jorainer commented 6 years ago

Thanks for testing. Sorry that it didn't work out (I was sure it would). Would it be possible for you to break the failing experiment down to, say, 2 files and share these with me? This would enable me to debug and fix the problem locally.

emmagraham commented 6 years ago

Thanks! I've sent an email to your EURAC account.

jorainer commented 6 years ago

@emmagraham , can you please install again the most recent xcms version from github and retry? Note, you will have to restart R after installing xcms.

devtools::install_github("sneumann/xcms", ref = "master")
emmagraham commented 6 years ago

This fixed the error. Thank you! Out of curiousity, what was the problem?

On Fri, May 18, 2018 at 5:19 AM, Johannes Rainer notifications@github.com wrote:

@emmagraham https://github.com/emmagraham , can you please install again the most recent xcms version from github and retry? Note, you will have to restart R after installing xcms.

devtools::install_github("sneumann/xcms", ref = "master")

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sneumann/xcms/issues/267#issuecomment-390189141, or mute the thread https://github.com/notifications/unsubscribe-auth/AMMitwfC1xcy5nTAWdtHmqf7o53UqTxlks5tzrxBgaJpZM4SuXcw .

-- Emma Graham, BSc Graduate Student in Bioinformatics Mostafavi Lab Centre for Molecular Medicine and Therapeutics (CMMT) | University of British Columbia

jorainer commented 6 years ago

I added an additional check that ensures the retention time ranges to be within the boundaries. Somehow there seem to have sneaked NA values through. Thanks for testing!

@lee-t , could you eventually also test the new version? I guess this might fix also your problem - and then we could close this issue.

lee-t commented 6 years ago

Memory issues regarding fillchrompeaks() have been working fine, even in parallel, on the latest master branch of xcms. Feel free to close