Open CLUES-Emory opened 1 week ago
Thanks for the details and the sessionInfo()
output. Could you please also provide the code that you used to setup the parallel processing and how you called findChromPeaks()
? And also, just to confirm, you're already using the new MsExperiment
/XcmsExperiment
objects, right (not the older OnDiskMSnExp
/XCMSnExp
)?
Thanks! Yes, we used the new MsExperiment objects. In fact, switching to the MsExperiment objects may be linked to this issue. When we originally built our workflow using the OnDiskMSnExp objects, we were able to use multiple cores (we saw performance improvements until we reached 20 cores).
The multicores were setup using the following:
register(bpstart(MulticoreParam(8)))
Files were read in using the following:
ms_data<- readMsExperiment(spectraFiles = mzML_files)
Peak detection was performed using these parameters and the code below.
#Step 1 XCMS peak detection parameters
xcms_params<-c()
xcms_params$cwp_ppm= 5
xcms_params$cwp_peakwidth= c(3,20)
xcms_params$cwp_snthr= 5
xcms_params$cwp_mzdiff= -0.001
xcms_params$cwp_noise= 20000
xcms_params$cwp_prefilter= c(5,20000)
xcms_params$cwp_mzCenterFun= "wMean"
xcms_params$cwp_integrate= 1
xcms_params$cwp_fitgauss= FALSE
xcms_params$cwp_extendLengthMSW=TRUE
#Step 1, peak detection
#Define CentWave parameterds
cwp <- CentWaveParam(
ppm= xcms_params$cwp_ppm,
peakwidth= xcms_params$cwp_peakwidth,
snthr= xcms_params$cwp_snthr,
mzdiff= xcms_params$cwp_mzdiff,
noise= xcms_params$cwp_noise,
prefilter= xcms_params$cwp_prefilter,
mzCenterFun= xcms_params$cwp_mzCenterFun,
integrate= xcms_params$cwp_integrate,
fitgauss= xcms_params$cwp_fitgauss,
extendLengthMSW= xcms_params$cwp_extendLengthMSW)
t1<-Sys.time()
#Detect peaks using cwp
step_1_res <- findChromPeaks(ms_data, param = cwp)
Sys.time() - t1
We've also tried running with the BPPARAM specified in the findChromPeaks function, but no difference was seen. E.g. step_1_res <- findChromPeaks(ms_data, param = cwp, BPPARAM = MulticoreParam(8))
Hello, I've noticed an interesting result when trying to use multiple cores to process MSExperiment objects using findChromPeaks. Regardless of the number of cores I register (using register(bpstart(MulticoreParam(num_cores))); it seems to only use two cores for processing. I've replicated this on both a Mac M1 (using R Studio) and Linux cluster. You can see the number of cores being used for parallel processing below. I've also timed how long it takes to process 10 files using 4 and 8 registered cores on both a Mac and Linux cluster, and the times seem to be the same. Both systems had 10 cores available.
Mac 4 cores: 3.83 mins 8 cores: 3.81 mons
Linux 4 cores: 5.83 mins 8 cores: 5.88 mins
4 cores registered
8 cores registered
Thank you in advance!
Session info is below.
R version 4.4.0 (2024-04-24) Platform: x86_64-pc-linux-gnu Running under: Rocky Linux 8.10 (Green Obsidian)
Matrix products: default BLAS: /apps/R/4.4.0/lib64/R/lib/libRblas.so LAPACK: /apps/R/4.4.0/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York tzcode source: system (glibc)
attached base packages: [1] stats4 parallel splines stats graphics grDevices utils
[8] datasets methods base
other attached packages: [1] tibble_3.2.1 WaveICA_0.1.0 data.table_1.15.4
[4] msentropy_0.1.4 MsBackendMsp_1.8.0 Spectra_1.14.1
[7] S4Vectors_0.42.1 BiocGenerics_0.50.0 RAMClustR_1.3.0
[10] writexl_1.5.0 xMSanalyzer_2.0.6.1 WGCNA_1.72-5
[13] fastcluster_1.2.6 dynamicTreeCut_1.63-1 sva_3.52.0
[16] genefilter_1.86.0 mgcv_1.9-1 nlme_3.1-164
[19] doSNOW_1.0.20 RCurl_1.98-1.16 limma_3.60.4
[22] R2HTML_2.3.4 XML_3.99-0.17 apLCMS_6.6.9
[25] ROCS_1.3 poibin_1.5 ROCR_1.0-11
[28] randomForest_4.7-1.1 e1071_1.7-14 gbm_2.2.2
[31] snow_0.4-4 doParallel_1.0.17 iterators_1.0.14
[34] foreach_1.5.2 mzR_2.38.0 Rcpp_1.0.12
[37] rgl_1.3.1 MASS_7.3-60.2 gridExtra_2.3
[40] ggplot2_3.5.1 readxl_1.4.3 microbenchmark_1.4.10 [43] RColorBrewer_1.1-3 dplyr_1.1.4 MsExperiment_1.6.0
[46] ProtGenerics_1.36.0 xcms_4.2.2 BiocParallel_1.38.0
loaded via a namespace (and not attached): [1] bitops_1.0-8 cellranger_1.1.0
[3] preprocessCore_1.66.0 pROC_1.18.5
[5] rpart_4.1.23 lifecycle_1.0.4
[7] edgeR_4.2.1 lattice_0.22-6
[9] MultiAssayExperiment_1.30.3 backports_1.4.1
[11] magrittr_2.0.3 rmarkdown_2.26
[13] Hmisc_5.1-3 plsdepot_0.2.0
[15] MsCoreUtils_1.16.1 DBI_1.2.2
[17] abind_1.4-5 zlibbioc_1.50.0
[19] GenomicRanges_1.56.1 purrr_1.0.2
[21] AnnotationFilter_1.28.0 JADE_2.0-4
[23] nnet_7.3-19 GenomeInfoDbData_1.2.12
[25] IRanges_2.38.1 MSnbase_2.30.1
[27] annotate_1.82.0 ncdf4_1.22
[29] codetools_0.2-20 DelayedArray_0.30.1
[31] tidyselect_1.2.1 UCSC.utils_1.0.0
[33] matrixStats_1.3.0 base64enc_0.1-3
[35] jsonlite_1.8.8 Formula_1.2-5
[37] survival_3.5-8 tools_4.4.0
[39] progress_1.2.3 glue_1.7.0
[41] SparseArray_1.4.8 xfun_0.43
[43] MatrixGenerics_1.16.0 ggfortify_0.4.17
[45] GenomeInfoDb_1.40.1 withr_3.0.0
[47] BiocManager_1.30.22 fastmap_1.1.1
[49] fansi_1.0.6 digest_0.6.35
[51] R6_2.5.1 colorspace_2.1-0
[53] GO.db_3.19.1 RSQLite_2.3.7
[55] waveslim_1.8.5 utf8_1.2.4
[57] tidyr_1.3.1 generics_0.1.3
[59] corpcor_1.6.10 class_7.3-22
[61] prettyunits_1.2.0 PSMatch_1.8.0
[63] httr_1.4.7 htmlwidgets_1.6.4
[65] S4Arrays_1.4.1 scatterplot3d_0.3-44
[67] pkgconfig_2.0.3 gtable_0.3.5
[69] blob_1.2.4 impute_1.78.0
[71] MassSpecWavelet_1.70.0 XVector_0.44.0
[73] htmltools_0.5.8.1 MALDIquant_1.22.2
[75] clue_0.3-65 scales_1.3.0
[77] Biobase_2.64.0 png_0.1-8
[79] knitr_1.46 MetaboCoreUtils_1.12.0
[81] rstudioapi_0.16.0 reshape2_1.4.4
[83] checkmate_2.3.2 proxy_0.4-27
[85] cachem_1.0.8 stringr_1.5.1
[87] foreign_0.8-86 AnnotationDbi_1.66.0
[89] mzID_1.42.0 vsn_3.72.0
[91] pillar_1.9.0 grid_4.4.0
[93] vctrs_0.6.5 MsFeatures_1.12.0
[95] pcaMethods_1.96.0 xtable_1.8-4
[97] cluster_2.1.6 htmlTable_2.4.3
[99] evaluate_0.23 cli_3.6.2
[101] locfit_1.5-9.10 compiler_4.4.0
[103] rlang_1.1.3 crayon_1.5.2
[105] fdrtool_1.2.17 multitaper_1.0-17
[107] QFeatures_1.14.2 affy_1.82.0
[109] plyr_1.8.9 fs_1.6.4
[111] stringi_1.8.3 munsell_0.5.1
[113] Biostrings_2.72.1 lazyeval_0.2.2
[115] Matrix_1.7-0 hms_1.1.3
[117] bit64_4.0.5 KEGGREST_1.44.1
[119] statmod_1.5.0 SummarizedExperiment_1.34.0 [121] igraph_2.0.3 memoise_2.0.1
[123] affyio_1.74.0 bit_4.0.5