kdkorthauer / dmrseq

R package for Inference of differentially methylated regions (DMRs) from bisulfite sequencing
MIT License
54 stars 14 forks source link

Parallel error #12

Closed scottgigante closed 6 years ago

scottgigante commented 6 years ago

I'm getting an error to do with the parallelization of dmrseq.

> bs <- BSseq(...)
> bs
An object of type 'BSseq' with
  19240982 methylation loci
  4 samples
has not been smoothed
All assays are in-memory
> pData(bs)$strain <- as.factor(c("b6xcast", "b6xcast", "castxb6", "castxb6"))
> pData(bs)$parent <- as.factor(c("mat", "pat", "pat", "mat"))
> loci.idx <- which(rowSums(getCoverage(bs, type="Cov")==0) == 0)
> bs <- bs[loci.idx]
> dmr <- dmrseq(bs,
                testCovariate="parent",
                adjustCovariate="strain")

I get this as output.

Assuming the test covariate parent is a factor.
Condition: pat vs mat
Adjusting for covariate: strain
Parallelizing using 6 workers/cores (backend: BiocParallel:MulticoreParam).

Detecting candidate regions with coefficient larger than 0.1 in magnitude.
...Chromosome 1: Error: 'bplapply' receive data failed:
  error reading from connection
Execution halted

I tried on another machine and got this error instead.

Error in serialize(data, node$con, xdr = FALSE) :
  error writing to connection
Calls: which ... .send_EXEC -> <Anonymous> -> sendData.SOCK0node -> serialize
Error: failed to stop 'SOCKcluster' cluster: error writing to connection
Execution halted
Error: failed to stop 'SOCKcluster' cluster: invalid connection

The first machine has 64GB of RAM, the second 256GB.

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /stornext/System/data/apps/R/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] dmrseq_1.0.0                bsseq_1.16.1
 [3] SummarizedExperiment_1.10.1 DelayedArray_0.6.0
 [5] BiocParallel_1.14.1         matrixStats_0.53.1
 [7] Biobase_2.40.0              GenomicRanges_1.32.3
 [9] GenomeInfoDb_1.16.0         IRanges_2.14.10
[11] S4Vectors_0.18.3            BiocGenerics_0.26.0
[13] forcats_0.3.0               stringr_1.3.1
[15] dplyr_0.7.5                 purrr_0.2.5
[17] readr_1.1.1                 tidyr_0.8.1
[19] tibble_1.4.2                ggplot2_2.2.1
[21] tidyverse_1.2.1

loaded via a namespace (and not attached):
  [1] colorspace_1.4-0              XVector_0.20.0
  [3] rstudioapi_0.7                bit64_0.9-8
  [5] interactiveDisplayBase_1.18.0 AnnotationDbi_1.42.1
  [7] lubridate_1.7.4               xml2_1.2.0
  [9] splines_3.5.0                 codetools_0.2-15
 [11] R.methodsS3_1.7.1             mnormt_1.5-5
 [13] jsonlite_1.5                  Rsamtools_1.32.0
 [15] broom_0.4.4                   R.oo_1.22.0
 [17] shiny_1.1.0                   HDF5Array_1.8.0
 [19] compiler_3.5.0                httr_1.3.1
 [21] assertthat_0.2.0              Matrix_1.2-14
 [23] lazyeval_0.2.1                limma_3.36.1
 [25] cli_1.0.0                     later_0.7.3
 [27] htmltools_0.3.6               prettyunits_1.0.2
 [29] tools_3.5.0                   bindrcpp_0.2.2
 [31] gtable_0.2.0                  glue_1.2.0
 [33] GenomeInfoDbData_1.1.0        annotatr_1.6.0
 [35] reshape2_1.4.3                doRNG_1.6.6
 [37] Rcpp_0.12.17                  bumphunter_1.22.0
 [39] cellranger_1.1.0              Biostrings_2.48.0
 [41] nlme_3.1-137                  rtracklayer_1.40.2
 [43] iterators_1.0.9               DelayedMatrixStats_1.2.0
 [45] psych_1.8.4                   rvest_0.3.2
 [47] mime_0.5                      rngtools_1.3.1
 [49] gtools_3.5.0                  XML_3.98-1.11
 [51] AnnotationHub_2.12.0          zlibbioc_1.26.0
 [53] scales_0.5.0                  BSgenome_1.48.0
 [55] BiocInstaller_1.30.0          hms_0.4.2
 [57] promises_1.0.1                rhdf5_2.24.0
 [59] RColorBrewer_1.1-2            yaml_2.1.19
 [61] memoise_1.1.0                 pkgmaker_0.27
 [63] biomaRt_2.36.1                stringi_1.2.2
 [65] RSQLite_2.1.1                 foreach_1.4.6
 [67] permute_0.9-4                 GenomicFeatures_1.32.0
 [69] bibtex_0.4.2                  rlang_0.2.1
 [71] pkgconfig_2.0.1               bitops_1.0-6
 [73] lattice_0.20-35               Rhdf5lib_1.2.1
 [75] bindr_0.1.1                   GenomicAlignments_1.16.0
 [77] bit_1.1-14                    tidyselect_0.2.4
 [79] plyr_1.8.4                    magrittr_1.5
 [81] R6_2.2.2                      DBI_1.0.0
 [83] withr_2.1.2                   pillar_1.2.3
 [85] haven_1.1.1                   foreign_0.8-70
 [87] RCurl_1.95-4.10               modelr_0.1.2
 [89] crayon_1.3.4                  progress_1.1.2
 [91] locfit_1.5-9.1                grid_3.5.0
 [93] readxl_1.1.0                  data.table_1.11.4
 [95] blob_1.1.1                    digest_0.6.15
 [97] xtable_1.8-3                  httpuv_1.4.3
 [99] regioneR_1.12.0               outliers_0.14
[101] R.utils_2.6.0                 munsell_0.5.0
[103] registry_0.5
scottgigante commented 6 years ago

Sorry for the noise - I just found solutions to both of these issues in other, closed issues #7 and #10.

kdkorthauer commented 6 years ago

Hi Scott,

Thanks for the report. It's possible that this is a separate issue from #7 and #10. The fix for issue 10 should be included in the version of dmrseq you are using (1.0.0). And it sounds like your system has enough RAM to use 6 cores on this size of a dataset (4 samples, ~19M loci).

Could you try updating all your Bioc packages (run BiocInstaller::biocValid() to determine if any are out of date) and try again? If the issue still persists, we can troubleshoot.

Best, Keegan

scottgigante commented 6 years ago

Hi Keegan,

Thanks for giving me the benefit of the doubt. I installed DelayedMatrixStats and all dependencies from GitHub and the problem is resolved.

Best, Scott

kdkorthauer commented 6 years ago

Hi Scott,

That's great to hear. I'm a bit concerned that you needed to install from GitHub, however. What version of DelayedMatrixStats solved the problem? Release or devel? If devel (master), did you try installing from Bioconductor 3.8 (devel) first?

Apologies for all the questions; just trying to do my best to track down potential problems!

Best, Keegan

scottgigante commented 6 years ago

Hi Keegan,

I've been trying to test this on devel, but not having much luck. I've run useDevel and biocLite, and now when I run dmrseq I get the following error:

> dmr <- dmrseq(bs,
              testCovariate="parent")
Assuming the test covariate parent is a factor.
Condition: pat vs mat
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'extract_block' not found
16. | get(name, envir = asNamespace(pkg), inherits = FALSE)
15. | DelayedArray:::extract_block
14. | FUN(X[[i]], ...)
13. | lapply(seq_len(nblock), function(b) { if (DelayedArray:::get_verbose_block_processing()) { message("Processing block ", b, "/", nblock, " ... ", appendLF = FALSE) ...
12. | block_APPLY(x, APPLY, MARGIN = 1, ..., sink = sink, max_block_len = max_block_len)
11. | rowblock_APPLY(x = x, APPLY = matrixStats::rowSums2, na.rm = na.rm, ...)
10. | .DelayedMatrix_block_rowSums2(x = x, rows = rows, cols = cols, na.rm = na.rm, dim. = dim.)
9. | .local(x, rows, cols, na.rm, dim., ...)
8. | DelayedMatrixStats::rowSums2(getCoverage(bs)[, pData(bs)[[testCovariate]] == lev[l]])
7. | DelayedMatrixStats::rowSums2(getCoverage(bs)[, pData(bs)[[testCovariate]] == lev[l]])
6. | eval(quote(list(...)), env)
5. | eval(quote(list(...)), env)
4. | eval(quote(list(...)), env)
3. | standardGeneric("rbind")
2. | rbind(filter, 1 * (DelayedMatrixStats::rowSums2(getCoverage(bs)[, pData(bs)[[testCovariate]] == lev[l]]) == 0))
1. | dmrseq(bs, testCovariate = "parent")

In no way do I think this is a problem with dmrseq. It could be my fault not knowing how to use devel?

kdkorthauer commented 6 years ago

Hi @scottgigante,

Thanks for reporting this issue. It looks like the error is triggered by a failure to call a function from the DelayedArray package. After you running useDevel, can you check the output of BiocInstaller::biocValid()? It's possible that some package versions are incompatible, so getting everything up to date should solve the issue in that case.

Best, Keegan