Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))

Preve92 commented 5 years ago

Bug Description Dear all,

First, thank you very much for your work! QIIME2 allowed me to learn a lot!

When running the command with 96 samples (7117864 reads) as input on a server having 40 threads and 128GB of memory.

qiime dada2 denoise-paired \
    --i-demultiplexed-seqs data/interim/trimmed-demux-seqs.qza \
    --p-trunc-len-f 260 \
    --p-trunc-len-r 230 \
    --p-trim-left-f 5 \
    --p-trim-left-r 5 \
    --p-min-fold-parent-over-abundance 1.5 \
    --p-n-threads 0 \
    --o-table data/interim/dada2_table.qza \
    --o-representative-sequences data/interim/dada2_rep-seqs.qza \
    --o-denoising-stats data/interim/dada2_denoising-stats.qza \
    --verbose 2>&1 | tee logs/dada2.log

I am encountering the error

Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : write error, closing pipe to the master

I tested the following with both QIIME2 versions 2019.4 and 2019.7 obtaining the same error. The complete log (dada2.log) is produced with v. 2019.7.

Comments

Trying a bit of debugging, I suspect the issue is related to mclapply as reported here. mclapply is used in the dada2 function filterAndTrim for parallelising the job. As you can see in the log file, after closing the pipe to master, the function return Error in names(answer) <- names1 : 'names' attribute [96] must be the same length as the vector [93]. Hence, I suspect mclapply manage to process 93 out of 96 samples and then it runs out of memory. However, when monitoring the process with htop I see the memory usage going up to 35-40 GB, and then the process halt. So I am not sure mclapply running out of memory is the actual issue: there's plenty of free memory.
I then tried to run the same command with half data set (48 samples), and it works without issue. Same using only 10 threads instead of all 40 (fail as well with 20) for the full data set. Running the full data set outside of QIIME2, in R according to the dada2 tutorial also works without issues. So I really can't figure out why it should not work in QIIME2. At the end, I am calling exactly the same command with the same parameters.

References

thermokarst commented 5 years ago

I wonder if you are still running out of available memory --- can you run ulimit -a and return the results here? Also, in the future, please try to utilize the issue templates that are presented to you when opening an issue, using those helps us a great deal, and allows us to get a reply to you as quickly as possible. Thanks!

thermokarst commented 5 years ago

A quick google search turned up this issue: https://github.com/benjjneb/dada2/issues/273

Looks like this is likely a case of running out of available memory on the host machine. Not sure why you aren't seeing this outside of QIIME 2, though - are you sure you're comparing the exact same commands and data?

benjjneb commented 5 years ago

Unfortunately the recommendation here would be to use the workaround of reducing the number of threads.

Unlike the other multithreaded functions in dada2 which are implemented directly in C++, the filtering step uses mclapply, which relies on forking for multithreading. Forking copies the whole R version to each thread, so the forked filtering step can use up a lot of memory when there are a large number of threads.

Preve92 commented 5 years ago

Dear @thermokarst

Thank you for your quick reply! And apologies for not using the template; I will remember to use it for any future issue. Here is the output of ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515025
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515025
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I am quite sure I am comparing the same data and command in R: here is the R command I ran

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("dada2", version = "3.9")
library("dada2")
path <- "data/exported/uncompressed/raw"
fnFs <- sort(list.files(path, pattern="_R1_001.fastq", full.names = TRUE))
fnRs <- sort(list.files(path, pattern="_R2_001.fastq", full.names = TRUE))
sample.names <- sapply(strsplit(basename(fnFs), "_"), `[`, 1)
plotQualityProfile(fnFs[1:2])
filtFs <- file.path(path, "filtered", paste0(sample.names, "_F_filt.fastq.gz"))
filtRs <- file.path(path, "filtered", paste0(sample.names, "_R_filt.fastq.gz"))
names(filtFs) <- sample.names
names(filtRs) <- sample.names
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(260,230), trimLeft=c(5,5), maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE)
errF <- learnErrors(filtFs, multithread=TRUE)
errR <- learnErrors(filtRs, multithread=TRUE)
plotErrors(errF, nominalQ=TRUE)
dadaFs <- dada(filtFs, err=errF, multithread=TRUE)
dadaRs <- dada(filtRs, err=errR, multithread=TRUE)
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE)
seqtab <- makeSequenceTable(mergers)
seqtab2 <- seqtab[,nchar(colnames(seqtab)) > 370]
seqtab.nochim <- removeBimeraDenovo(seqtab2, method="consensus", multithread=TRUE, verbose=TRUE)
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab.nochim))
colnames(track) <- c("input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")
rownames(track) <- sample.name

The main difference is that I am working on already unarchived fastq file (i.e. not fastq.gz)

Preve92 commented 5 years ago

Dear @benjjneb Thank for your help! I understand it would take a lot of memory, but why does it work in R then? And why I don't see the memory usage growing above 35-40GB, close to the upper limit of 128GB?

benjjneb commented 5 years ago

Thank for your help! I understand it would take a lot of memory, but why does it work in R then? And why I don't see the memory usage growing above 35-40GB, close to the upper limit of 128GB?

Fair point, I don't know why it is working in R then.

thermokarst commented 5 years ago

BiocManager::install("dada2", version = "3.9")
library("dada2")

This looks like the install instructions for DADA2 1.12, can you confirm what version you are running in standalone R?

Preve92 commented 5 years ago

Dear @thermokarst It is indeed dada2 v. 1.12. Does this version use a different filterAndTrim function?

thermokarst commented 5 years ago

It is indeed dada2 v. 1.12

I think this comparison is confounded by the use of two different versions. q2-dada2 currently uses dada2 1.10.

Preve92 commented 5 years ago

I will test with dada2 v 1.10 and report! Thank you very much for your help! (:

RJ333 commented 5 years ago

Hey everone, I also ran into this issue with

> seqTableNoChimera <- removeBimeraDenovo(seqTable2, method="per-sample",
+                                                multithread=TRUE)
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  write error, closing pipe to the master
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  write error, closing pipe to the master
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  write error, closing pipe to the master
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  write error, closing pipe to the master

Reducing it to a lower number of cores, in my cases multithread = 20 solved the issue. I work on a cloud were many cores and not so much memory and storage is available.

benjjneb commented 5 years ago

@RJ333 Can you clarify what version of the dada2 R package you used to get that error?

Reducing it to a lower number of cores, in my cases multithread = 20 solved the issue. I work on a cloud were many cores and not so much memory and storage is available.

That is the situation where specifying the number of cores rather than detecting and using all detected makes sense. Would be nice to fail more gracefully though.

RJ333 commented 5 years ago

Hey Benjamin,

just wanted to add that I ran the code last night on 4 virtual machines with the limit of 20 cores with a full MiSeq run on each VM, no errors (The above was from the test run, where I only used 5 samples).

this is my sessionInfo():

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/ubuntu/miniconda3/envs/dada2_tnt/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ggplot2_3.2.1 dada2_1.10.0  Rcpp_1.0.2

loaded via a namespace (and not attached):
 [1] plyr_1.8.4                  RColorBrewer_1.1-2
 [3] pillar_1.4.2                compiler_3.5.1
 [5] GenomeInfoDb_1.18.1         XVector_0.22.0
 [7] bitops_1.0-6                tools_3.5.1
 [9] zlibbioc_1.28.0             lattice_0.20-38
[11] tibble_2.1.3                gtable_0.3.0
[13] pkgconfig_2.0.2             rlang_0.4.0
[15] Matrix_1.2-17               DelayedArray_0.8.0
[17] parallel_3.5.1              GenomeInfoDbData_1.2.1
[19] withr_2.1.2                 stringr_1.4.0
[21] hwriter_1.3.2               dplyr_0.8.3
[23] Biostrings_2.50.2           S4Vectors_0.20.1
[25] IRanges_2.16.0              stats4_3.5.1
[27] grid_3.5.1                  tidyselect_0.2.5
[29] data.table_1.12.2           glue_1.3.1
[31] Biobase_2.42.0              R6_2.4.0
[33] BiocParallel_1.16.6         latticeExtra_0.6-28
[35] reshape2_1.4.3              purrr_0.3.2
[37] magrittr_1.5                Rsamtools_1.34.0
[39] scales_1.0.0                matrixStats_0.54.0
[41] BiocGenerics_0.28.0         GenomicRanges_1.34.0
[43] GenomicAlignments_1.18.1    ShortRead_1.40.0
[45] assertthat_0.2.1            SummarizedExperiment_1.12.0
[47] colorspace_1.4-1            labeling_0.3
[49] stringi_1.4.3               RCurl_1.95-4.12
[51] RcppParallel_4.4.3          lazyeval_0.2.2
[53] munsell_0.5.0               crayon_1.3.4

qiime2 / q2-dada2

Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) #119