sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

Running featureSpectra meets the issue from BiocParallel #602

Open MuyaoXi9271 opened 2 years ago

MuyaoXi9271 commented 2 years ago

Thanks in advance for taking care of this issue.

I got the error of "Error: BiocParallel errors 1 remote errors, element index: 3 1 unevaluated and other errors first remote error: cannot open the connection"when I run the codefilteredMs2Spectra_NEG <- featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra")`. However, I did not meet this error after restarting R or rebooting my computer. I got the result image

I also consulted the experts and described the above error under the 'Issues' of BiocParallel #https://github.com/Bioconductor/BiocParallel/issues/177#issuecomment-1055541439 and I received the answer to run BiocParallel::register(BiocParallel::SerialParam(), default = TRUE) and I did not meet that error anymore, but I did not get the same result image

I also tried to dig into the contents of both outputs as shown below

image

I guess something wrong with the step of res <- bpmapply(ms2_mspectrum_for_peaks_from_file, x, pks, MoreArgs = list(method = method), SIMPLIFY = FALSE, USE.NAMES = FALSE, BPPARAM = BPPARAM) under the function of "ms2_mspectrum_for_all_peaks"

I am looking forward to any reply.

Bests, Muyao

jorainer commented 2 years ago

Please provide also the output from sessionInfo()

jorainer commented 2 years ago

Note that the problem reported in https://github.com/Bioconductor/BiocParallel/issues/177#issuecomment-1055586237 does not apply here, since we're opening the file in each parallel process.

To reproduce and better understand the error, can you please run these two code blocks and evaluate the results?

register(SnowParam(3))
A <- featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra")
length(A)
length(unique(mcols(A)$feature_id))
register(SerialParam())
B <- featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra")
length(B)
length(unique(mcols(B)$feature_id))

I did not see any difference for some of my data sets. One possible reason I could imagine is that the cannot open the connection error does not refer to opening the file, but opening the connection from the main process to one of the worker processes. Maybe you used register(bppstart(SnowParam(3))) (i.e. you used bppstart to initialize the processes) and the R session was long open and maybe one of the processes got closed by the operating system?

In order to better understand what happened, please run the two code blocks above and compare the number of features and spectra you find in both and also report the output of your sessionInfo().

MuyaoXi9271 commented 2 years ago

Hi Johannes,

Thanks for your recommendations. I got the same result after running the above two suggestions.

_> length(A) [1] 177199

length(unique(mcols(A)$feature_id)) [1] 747 register(SerialParam()) B <- featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra") length(B) [1] 177199 length(unique(mcols(B)$featureid)) [1] 747

The output of my sessionInfo() is R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] xcms_3.17.1 MSnbase_2.20.4 ProtGenerics_1.27.2 S4Vectors_0.32.3
[5] mzR_2.28.0 Rcpp_1.0.8 Biobase_2.54.0 BiocGenerics_0.40.0 [9] BiocParallel_1.28.3

loaded via a namespace (and not attached): [1] lattice_0.20-45 snow_0.4-4 assertthat_0.2.1
[4] digest_0.6.29 foreach_1.5.2 utf8_1.2.2
[7] R6_2.5.1 GenomeInfoDb_1.30.1 plyr_1.8.6
[10] mzID_1.32.0 ggplot2_3.3.5 pillar_1.7.0
[13] zlibbioc_1.40.0 rlang_1.0.2 Matrix_1.4-0
[16] preprocessCore_1.56.0 RCurl_1.98-1.6 munsell_0.5.0
[19] DelayedArray_0.20.0 compiler_4.1.2 MsFeatures_1.2.0
[22] pkgconfig_2.0.3 pcaMethods_1.86.0 tidyselect_1.1.2
[25] SummarizedExperiment_1.24.0 GenomeInfoDbData_1.2.7 tibble_3.1.6
[28] RANN_2.6.1 IRanges_2.28.0 codetools_0.2-18
[31] matrixStats_0.61.0 XML_3.99-0.9 fansi_1.0.2
[34] crayon_1.5.0 dplyr_1.0.8 bitops_1.0-7
[37] MASS_7.3-55 MassSpecWavelet_1.60.0 grid_4.1.2
[40] gtable_0.3.0 lifecycle_1.0.1 affy_1.72.0
[43] DBI_1.1.2 magrittr_2.0.2 MsCoreUtils_1.7.4
[46] scales_1.1.1 ncdf4_1.19 cli_3.2.0
[49] impute_1.68.0 XVector_0.34.0 affyio_1.64.0
[52] doParallel_1.0.17 limma_3.50.1 robustbase_0.93-9
[55] ellipsis_0.3.2 generics_0.1.2 vctrs_0.3.8
[58] RColorBrewer_1.1-2 iterators_1.0.14 tools_4.1.2
[61] glue_1.6.2 DEoptimR_1.0-10 purrr_0.3.4
[64] MatrixGenerics_1.6.0 parallel_4.1.2 clue_0.3-60
[67] colorspace_2.0-3 cluster_2.1.2 BiocManager_1.30.16
[70] vsn_3.62.0 GenomicRanges_1.46.1 MALDIquant_1.21

I am confusing why I get the other result by running the same code featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra"). It is really weird.

Bests, Muyao

jorainer commented 2 years ago

maybe you had already the variable filteredMs2Spectra_NEG in your R session (from a previous analysis)? I don't see any possibility results to be different whether you run the featureSpectra with or without parallel processing.

MuyaoXi9271 commented 2 years ago

Hi Johannes,

I am quite sure there is no variable filteredMs2Spectra_NEG in my environment. I actually run

BPPARAM_capped <- switch(Sys.info()["sysname"], Windows = SnowParam(max(1, min(4, detectCores()-1)), progressbar = TRUE), MulticoreParam( max(1, min(4, detectCores()-1)), progressbar = TRUE) ) and xcms_p_NEG_g_r_g_fill <- fillChromPeaks(xcms_p_NEG_g_r_g, params$FillChromPeaksParam, BPPARAM = BPPARAM_capped) to get the gap filled MSpectra. And then run filteredMs2Spectra_NEG <- featureSpectra(xcms_p_NEG_g_r_g_fill, return.type = "MSpectra") directly.

So the problem is from the step of gap-filling?

Bests, Muyao

jorainer commented 2 years ago

Hm, also for the gap-filling, I can not explain why that should be any different if run in parallel or serial processing mode. Note that gap filling will not have an influence on the returned MSpectra. Gap filling will only fill missing values in the quantified feature intensity matrix, but will not affect MS1 or MS2 spectra data.