sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

connection Error in Biocparallel for chromatogram function #627

Closed tobifuh closed 4 months ago

tobifuh commented 2 years ago

We are using Rstudion on a Windows machine and follow the example: [https://bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html] We succesfully loaded 59 mzML files from an LC-MS run, the step: bpis <- chromatogram(raw_data, aggregationFun = "max") triggers the follwing error:

bpis <- chromatogram(raw_data, aggregationFun = "max") Stop worker failed with the error: wrong args for environment subassignment Error: BiocParallel errors 8 remote errors, element index: 2, 10, 14, 19, 25, 29, ... 27 unevaluated and other errors first remote error: Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection

tobifuh commented 2 years ago

update: with a smaller subset of just 4 files it works after executing the command bpis <- chromatogram(raw_data, aggregationFun = "max") a few times, for some reason this error is not reproducible...

jorainer commented 2 years ago

On Windows for parallel processing a new R process is started for each parallel job. The error above suggests that somehow one of these R processes was closed or no longer accessible from the main thread. I would suggest to restrict the parallel processes to at max number of (physical!) CPUs - 1. You can set that using:

register(bpstart(SnowParam(3)))  ## To start 3 processes for parallel processing

Also, always have an eye on the memory. It doesn't help to run too many parallel processes with limited memory (this might actually have caused the first error, because one of the processes might have been killed because the system was running out of memory?).

tobifuh commented 2 years ago

Thanks for the quick reply, we have 10 physical cores / 20 logical cores, even with setting SnowParam(3) we trigger this error: Error: BiocParallel errors 1 remote errors, element index: 4 2 unevaluated and other errors first remote error: Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection

tobifuh commented 2 years ago

Memory is not even close to be used up.

jorainer commented 2 years ago

Hm, then maybe there might be some other error which is not reported properly due to the parallel processing. Could you please try the same call but after disabling parallel processing with

register(SerialParam())

(and not enabling it with the register(SnowParam(3)) again).

this is just to exclude that there's some other error possibly related to one file...

tobifuh commented 2 years ago

ok that definitively helped, something is wrong with the parallel handling of the jobs whatever, it is even faster with serial processing and doesn't crash at all

jorainer commented 2 years ago

Yes I know - parallel processing on Windows is tricky. On unix the process is simply forked and can even share the memory. On Windows the main process and the child processes need to communicate through sockets - and I guess that's where the problem lies. Somehow either a socket was not available or the connection was closed.

chuyaowang commented 2 years ago

I have a similar problem when going through the xcms tutorial and when processing my own data. Whenever parallel processing is involved, there is a (high) chance the code will not run. On one computer I run R in WSL2, and on another I run R in Windows 11. I have a macbook but it only has 8G memory, so I haven't tried to do any data processing there. Is there no solution to make it more stable?

jorainer commented 2 years ago

The core is stable. I would not like to impose any parallel processing options on the user. IMHO the user needs to configure the parallel processing setup that best fits his/her computational resources. This is highly dependent on the data set, the operating system, the CPU and the memory which makes it impossible to define the best setting.

drewszabo commented 1 year ago

I had a similar problem aligning features, and I am only able to fix it by completely disabling parallel analysis. I tried using 2 Native Windows PCs, each with relatively high capabilities: 1)11th Gen I7, 8 core-16 thread, 64GB RAM, and 2) Xeon-W, 8 core-16 thread, 32GB RAM. I have 47 analysis files with approx 1.5 million features extracted for NTA. The compute is not hitting either system very hard and I have the overhead to run parallel - it just wont work.

jorainer commented 1 year ago

Can you please provide an output of your sessionInfo() and what settings you used for the alignment?

drewszabo commented 1 year ago

Sure! Thanks for your help.

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8    LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                       LC_TIME=English_Australia.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xcms_3.18.0         MSnbase_2.22.0      ProtGenerics_1.28.0 mzR_2.30.0          Rcpp_1.0.9          Biobase_2.56.0      BiocParallel_1.30.4 S4Vectors_0.34.0   
 [9] BiocGenerics_0.42.0 patRoon_2.1.0      

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.8.1        vsn_3.64.0                  foreach_1.5.2               shiny_1.7.3                 Rdpack_2.4                  assertthat_0.2.1           
 [7] BiocManager_1.30.19         affy_1.74.0                 GenomeInfoDbData_1.2.9      robustbase_0.95-0           impute_1.70.0               pillar_1.8.1               
[13] backports_1.4.1             lattice_0.20-45             glue_1.6.2                  limma_3.52.4                digest_0.6.30               XVector_0.36.0             
[19] GenomicRanges_1.48.0        RColorBrewer_1.1-3          promises_1.2.0.1            rbibutils_2.2.9             checkmate_2.1.0             colorspace_2.0-3           
[25] Matrix_1.5-1                htmltools_0.5.3             httpuv_1.6.6                preprocessCore_1.58.0       plyr_1.8.7                  MALDIquant_1.21            
[31] XML_3.99-0.12               pkgconfig_2.0.3             zlibbioc_1.42.0             xtable_1.8-4                scales_1.2.1                RANN_2.6.1                 
[37] affyio_1.66.0               later_1.3.0                 tibble_3.1.8                generics_0.1.3              IRanges_2.30.1              ggplot2_3.4.0              
[43] ellipsis_0.3.2              withr_2.5.0                 SummarizedExperiment_1.26.1 MassSpecWavelet_1.62.0      cli_3.4.1                   magrittr_2.0.3             
[49] mime_0.12                   ncdf4_1.19                  fansi_1.0.3                 doParallel_1.0.17           MASS_7.3-58.1               MsFeatures_1.4.0           
[55] tools_4.2.2                 data.table_1.14.4           matrixStats_0.62.0          lifecycle_1.0.3             munsell_0.5.0               DelayedArray_0.22.0        
[61] cluster_2.1.4               GenomeInfoDb_1.32.4         pcaMethods_1.88.0           compiler_4.2.2              mzID_1.34.0                 rlang_1.0.6                
[67] RCurl_1.98-1.9              grid_4.2.2                  iterators_1.0.14            rstudioapi_0.14             MsCoreUtils_1.8.0           bitops_1.0-7               
[73] gtable_0.3.1                codetools_0.2-18            DBI_1.1.3                   R6_2.5.1                    dplyr_1.0.10                fastmap_1.1.0              
[79] utf8_1.2.2                  clue_0.3-62                 parallel_4.2.2              vctrs_0.5.0                 DEoptimR_1.0-11             tidyselect_1.2.0    

I converted Thermo .raw DDA (centroid) data with MSConvert to .mzML format. Then, I followed the basic vignette for XCMS on the Bioconductor website (https://bioconductor.org/packages/devel/bioc/vignettes/xcms/inst/doc/xcms.html). After listing the directories for the mzML files and creating the corresponding dataframe, I used:

raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")

Then when I tried to use the chromatogram() function to extract the chromatograms, I got the same error that the original poster had :

Stop worker failed with the error: wrong args for environment subassignment
Error: BiocParallel errors
  3 remote errors, element index: 6, 16, 25
  19 unevaluated and other errors
  first remote error:
Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection
jorainer commented 1 year ago

Thanks for the information. Can you try the following:

do you get the same error without parallel processing? Call chromatogram using BPPARAM = SerialParam() (please also add all other parameters/settings you used in the chromatogram call):

chrs <- chromatogram(raw_data, BPPARAM = SerialParam())

Initialize the parallel processing setup only once and start it (will cause all processes to be initiated). Note that that way you should not provide the parallel processing setup with the BPPARAM parameter. All functions in xcms will use the default parallel processing setup returned with bpparam(). Using bpstart ensures that the workers (for Windows separate R processes) are created. Any subsequent parallel processing function will use these and (AFAIK) not initiate new ones each time.

register(bpstart(SnowParam(4)))
chrs <- chromatogram(raw_data)
drewszabo commented 1 year ago

That's right. So using default parallel parameters, the error is returned.

> rtr <- c(100, 1500)
> mzr <- c(100, 1500)
> chrs <- chromatogram(raw_data, mz = mzr, rt = rtr)
Stop worker failed with the error: wrong args for environment subassignment
Error: BiocParallel errors
  4 remote errors, element index: 13, 17, 22, 26
  25 unevaluated and other errors
  first remote error:
Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection

And I don't think the error does not persist without parallel processing. This code may very well complete, however it will take some time. It has been running for almost 10hrs now but it has not completed yet.

> chrs <- chromatogram(raw_data, mz = mzr, rt = rtr, BPPARAM = SerialParam())

The next test also failed, using SnowParam(4):

> register(bpstart(SnowParam(4)))
> chrs <- chromatogram(raw_data, mz = mzr, rt = rtr)
Stop worker failed with the error: wrong args for environment subassignment
Error: BiocParallel errors
  1 remote errors, element index: 2
  41 unevaluated and other errors
  first remote error:
Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection

The problem is that I am leaving a lot of compute performance on the table by disabling parallel processing. Scaling up to larger projects will be difficult for me.

jorainer commented 1 year ago

I'm not familiar with parallel processing on Windows - could maybe also be a firewall issue? Following a bit the discussion here, maybe it helps to provide the hostname of the master node?

register(bpstart(SnowParam(4, manager.hostname = nsl(Sys.info())["nodename"])))

Also, maybe try what happens with a real simple parallel task (independent of xcms):

library(BiocParallel)
register(bpstart(SnowParam(4)))
res <- bplapply(1:10000, sqrt)

does that run without problems?

drewszabo commented 1 year ago

First I tested the simple parallel task, but I had to increase the size of the list to 1,000,000 to check the performance properly. It ran 4 instances with no problem. Each instance was drawing approx 7.1% CPU usage so still within the lower end of the chips capability. There were a few warning codes but I don't know if that is relevant.

> library(BiocParallel)
> register(bpstart(SnowParam(4)))
> res <- bplapply(1:1000000, sqrt)
Warning messages:
1: In serialize(data, node$con) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con) :
  'package:stats' may not be available when loading
4: In serialize(data, node$con) :
  'package:stats' may not be available when loading

Then, after trying to provide the hostname, the same error seems to occur. My system would not recognise the nsl() function so that was removed.

> register(bpstart(SnowParam(4, manager.hostname = Sys.info()["nodename"])))
> chrs <- chromatogram(raw_data, mz = mzr, rt = rtr)
Stop worker failed with the error: wrong args for environment subassignment
Error: BiocParallel errors
  0 remote errors, element index: 
  43 unevaluated and other errors
  first remote error:
sneumann commented 1 year ago

So maybe take that to the BioC community ? The BioC Support site https://bioconductor.org/help/ might be a good place. Yours, Steffen

drewszabo commented 1 year ago

Great, will do! Thanks again for your help

sneumann commented 1 year ago

Great, and please do report back here if solved, that'll help others having similar issues. Yours, Steffen

drewszabo commented 1 year ago

Hmm, it seems their support page is not well maintained. There are already posts from users encountering this problem, across multiple packages, with very few suggestions from the devs or community. The only suggestions seem to be to limit parallel processing or maybe rolling back the version.

Can I ask @sneumann & @jorainer, what was the latest stable version that you know of for Windows?

https://support.bioconductor.org/p/9135776/ https://support.bioconductor.org/p/9141727/ https://support.bioconductor.org/p/122494/ https://support.bioconductor.org/p/105014/

DrRuiLi commented 5 months ago

I faced this problem in similar scene and traced this problem to the source code. In brief, this problem was leaded by the neglected parameter passing of BPPARAM

xcms::chromatogram() >> MSnbase::chromatogram() >> MSnbase:::.extractMultipleChromatograms()

In function MSnbase:::.extractMultipleChromatograms(), parallel process is actually conducted by the following code

suppressWarnings(res <- bpmapply(subs_by_file, match(fileNames(subs), 
    fns), FUN = function(cur_sample, cur_file, rtm, mzm, 
    aggFun) {
    sps <- spectra(cur_sample)
    rts <- rtime(cur_sample)
    cur_res <- vector("list", nrow(rtm))
    for (i in 1:nrow(rtm)) {
      in_rt <- rts >= rtm[i, 1] & rts <= rtm[i, 2]
      if (!any(in_rt)) {
        cur_res[[i]] <- MSnbase::Chromatogram(filterMz = mzm[i, 
          ], fromFile = as.integer(cur_file), aggregationFun = aggFun)
        next
      }
      cur_sps <- lapply(sps[in_rt], function(spct, filter_mz, 
        aggFun) {
        spct <- filterMz(spct, filter_mz)
        if (!spct@peaksCount) 
          return(c(NA_real_, NA_real_, missingValue, 
            NA_real_))
        c(range(spct@mz, na.rm = TRUE, finite = TRUE), 
          do.call(aggFun, list(spct@intensity, na.rm = TRUE)), 
          spct@msLevel)
      }, filter_mz = mzm[i, ], aggFun = aggFun)
      allVals <- unlist(cur_sps, use.names = FALSE)
      int_idx <- seq(3, length(allVals), by = 4)
      mslevel_idx <- seq(4, length(allVals), by = 4)
      ints <- allVals[int_idx]
      names(ints) <- names(cur_sps)
      mz_range <- mzm[i, ]
      if (!all(is.na(ints))) 
        mz_range <- range(allVals[-c(int_idx, mslevel_idx)], 
          na.rm = TRUE, finite = TRUE)
      cur_res[[i]] <- MSnbase::Chromatogram(rtime = rts[in_rt], 
        intensity = ints, mz = mz_range, filterMz = mzm[i, 
          ], fromFile = as.integer(cur_file), aggregationFun = aggFun, 
        msLevel = as.integer(unique(allVals[mslevel_idx])))
    }
    cur_res
  }, MoreArgs = list(rtm = rt, mzm = mz, aggFun = aggregationFun), 
    BPPARAM = BPPARAM, SIMPLIFY = FALSE))

sps <- spectra(cur_sample) run according to bpparam(), generally be SnowParam(), NO MATTER BPPARAM INPUT , it open new parallel processing inside a parallel processing and lead to this error

> getMethod(spectra,signature = "OnDiskMSnExp")
Method Definition:

function (object, ...) 
{
    .local <- function (object, BPPARAM = bpparam()) 
    {
        return(spectrapply(object, BPPARAM = BPPARAM))
    }
    .local(object, ...)
}
<bytecode: 0x000002a824e344f0>
<environment: namespace:MSnbase>

Signatures:
        object        
target  "OnDiskMSnExp"
defined "OnDiskMSnExp"

try this code

xcms.fdf <- featureDefinitions(xcms.xcms)
mzr <- xcms.fdf[,c("mzmin","mzmax")]%>%as.matrix()
rtr <-  xcms.fdf[,c("rtmin","rtmax")]%>%as.matrix()
xcms.split <- sapply(seq_along(filepaths(xcms.xcms)),
                     function(x) filterFile(xcms.xcms,x))
xcms.chrom <- bplapply(xcms.split,function(x,mzr,rtr){
  #register(SerialParam())
  y <-NA
  try(y <- xcms::chromatogram(x,
                            mz = mzr[1:100,],
                            rt = rtr[1:100,],
                            aggregationFun = "max",
                            BPPARAM = SerialParam())
  )
  return(y)
},rtr=rtr,mzr = mzr,BPPARAM = SnowParam(workers = 4,
                      progressbar = T))

lead to BiocParallel errors

Error : BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection

but if register(SerialParam()) in parallel, it will be successfully processed

xcms.chrom <- bplapply(xcms.split,function(x,mzr,rtr){
  register(SerialParam())
  y <-NA
  try(y <- xcms::chromatogram(x,
                            mz = mzr[1:100,],
                            rt = rtr[1:100,],
                            aggregationFun = "max",
                            BPPARAM = SerialParam())
  )
  return(y)
},rtr=rtr,mzr = mzr,BPPARAM = SnowParam(workers = 4,
                      progressbar = T))

you can split xcms by files and run in parallel manually, add register(SerialParam()) in parallel process Hope this could be helpful for you And maybe it is useful for maintainer to fix this problem @sneumann @jorainer

jorainer commented 5 months ago

Good catch @WallFacerLR ! The current workaround would be, as you suggest to disable parallel processing by default (with register(SerialParam())) and then pass BPPARAM = MulticoreParam(4) or similar to the chromatogram() call. We'll also add a fix for this.

DrRuiLi commented 5 months ago

Good catch @WallFacerLR ! The current workaround would be, as you suggest to disable parallel processing by default (with register(SerialParam())) and then pass BPPARAM = MulticoreParam(4) or similar to the chromatogram() call. We'll also add a fix for this.

Thanks for solving this problem efficiently!

jorainer commented 4 months ago

I'm closing this issue now. Feel free to re-open if needed.