hansenlab / bsseq

Devel repository for bsseq
35 stars 25 forks source link

Problem in collapseBSseq #65

Closed CathG closed 6 years ago

CathG commented 6 years ago

I updated bsseq (and DelayedMatrix) and collapseBSseq, which was working as expected prior to this, crashes now, giving me a connection error (Error in serialize(data, node$con) / failed to stop ‘SOCKcluster’ cluster), which corresponds to my RAM reaching the maximum (64 Go).

My SessionInfo():

[R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.10.4-3        gtools_3.5.0               bsseq_1.15.2               SummarizedExperiment_1.8.1 DelayedArray_0.5.27       
 [6] BiocParallel_1.12.0        matrixStats_0.53.1         Biobase_2.38.0             GenomicRanges_1.30.3       GenomeInfoDb_1.14.0       
[11] IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.25.3       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16           XVector_0.18.0         zlibbioc_1.24.0        munsell_0.4.3          colorspace_1.3-2       lattice_0.20-35       
 [7] plyr_1.8.4             tools_3.4.4            grid_3.4.4             snow_0.4-2             R.oo_1.21.0            permute_0.9-4         
[13] Matrix_1.2-14          GenomeInfoDbData_1.0.0 R.utils_2.6.0          bitops_1.0-6           RCurl_1.95-4.10        limma_3.34.9          
[19] compiler_3.4.4         R.methodsS3_1.7.1      scales_0.5.0           locfit_1.5-9.1   `

Thanks

PeteHaitch commented 6 years ago

Are you able to share a minimal example that reproduces this error?

CathG commented 6 years ago

Thank you for your reply, I am trying to create one (the "minimal" part is the trickiest...)

PeteHaitch commented 6 years ago

Yes, it can be tough to create a minimal example with WGBS.

My suggestion would be to start with 2 samples where each sample has data from (possibly different) small chromosomes (e.g., chr21, chr22 if human).

CathG commented 6 years ago

I'm not sure about the minimal example but at least it reproduces the error:

 gr <- GRanges(seqnames=rep(paste0("chr", 1:10), e=1000000), IRanges(1:10000000, width=1L), strand="*")
set.seed(123); M_test <- Cov_test <- matrix(sample(1:100, 1000000000, replace=TRUE), ncol=100)
bsseq_test <- BSseq(M=M_test, Cov=Cov_test, gr=gr, sampleNames=paste("ech", 1:100, sep="_"))
collapseBSseq(bsseq_test, rep("ech", 100))

Makes me get: Error in serialize(data, node$con) : erreur d'écriture vers la connexion Erreur : failed to stop ‘SOCKcluster’ cluster: erreur d'écriture vers la connexion

Both french parts mean something like "error in writing to connection"

Another error message I can get is : Error in summary.connection(connection) : connexion incorrecte (which means "incorrect connection")

PeteHaitch commented 6 years ago

You are mixing-and-matching release and development versions of Bioconductor packages. This is not recommended and is not supported. Specifically, you are using the development version of bsseq (v1.15.2) but should be using the release version (currently v1.14.0) on that version of (R 3.4.4).

Please run BiocInstaller::biocValid(); this should confirm my diagnosis and will give you instructions to fix the problem. Also, please read https://www.bioconductor.org/install/ for details on the release and development versions of Bioconductor.

PeteHaitch commented 6 years ago

To round this out, your example worked for me using both the current release version (v1.14.0 on R 3.4.3) and the ~development version (v1.15.2 on R devel) of bsseq~ (see https://github.com/hansenlab/bsseq/issues/65#issuecomment-380501886). I tested on Linux because I don't have access to a Windows machine, but there are no errors in the current Bioconductor build reports on Windows.

I'm going to close this issue but please re-open if you are still having this problem after fixing your Bioconductor installation.

PeteHaitch commented 6 years ago

Update: I can reproduce the error with v1.15.2 and a proper installation of the Bioconductor devel branch

library(bsseq)

# Reformatted @CathG example
gr <- GRanges(
    seqnames = rep(paste0("chr", 1:10), each = 1000000), 
    IRanges(1:10000000, width = 1L), 
    strand = "*")
set.seed(123)
M_test <- Cov_test <- matrix(sample(1:100, 1000000000, replace = TRUE), 
                             ncol=100)
bsseq_test <- BSseq(
    M = M_test, 
    Cov = Cov_test, 
    gr = gr, 
    sampleNames = paste("ech", 1:100, sep = "_"))
collapseBSseq(bsseq_test, rep("ech", 100))
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Error: failed to stop ‘SOCKcluster’ cluster: error writing to connection

@CathG You should still fix your Bioconductor installation. I will fix this bug for the next release of bsseq.

Session info ``` r > BiocInstaller::biocValid() * sessionInfo() R Under development (unstable) (2018-03-21 r74433) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago) Matrix products: default BLAS: /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/lib/libRblas.so LAPACK: /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices datasets utils [8] methods base other attached packages: [1] bsseq_1.15.2 SummarizedExperiment_1.9.16 [3] DelayedArray_0.5.30 BiocParallel_1.13.3 [5] matrixStats_0.53.1 Biobase_2.39.2 [7] GenomicRanges_1.31.23 GenomeInfoDb_1.15.5 [9] IRanges_2.13.28 S4Vectors_0.17.41 [11] BiocGenerics_0.25.3 devtools_1.13.5 loaded via a namespace (and not attached): [1] Rcpp_0.12.16 BiocInstaller_1.29.6 compiler_3.5.0 [4] plyr_1.8.4 XVector_0.19.9 R.methodsS3_1.7.1 [7] bitops_1.0-6 R.utils_2.6.0 tools_3.5.0 [10] zlibbioc_1.25.0 digest_0.6.15 memoise_1.1.0 [13] lattice_0.20-35 Matrix_1.2-13 GenomeInfoDbData_1.1.0 [16] withr_2.1.2 knitr_1.20 gtools_3.5.0 [19] locfit_1.5-9.1 grid_3.5.0 data.table_1.10.4-3 [22] limma_3.35.14 scales_0.5.0 permute_0.9-4 [25] colorspace_1.3-2 RCurl_1.95-4.10 munsell_0.4.3 [28] R.oo_1.21.0 Library path directories: /users/phickey/R/x86_64-pc-linux-gnu-library/3.5 /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library * Out-of-date packages Package ade4 "ade4" arules "arules" BiocCheck "BiocCheck" BiocInstaller "BiocInstaller" biocViews "biocViews" biovizBase "biovizBase" bsseq "bsseq" DelayedArray "DelayedArray" future "future" GenomicScores "GenomicScores" grpreg "grpreg" haplo.stats "haplo.stats" HDF5Array "HDF5Array" MafDb.gnomAD.r2.0.1.hs37d5 "MafDb.gnomAD.r2.0.1.hs37d5" mouse.db0 "mouse.db0" MSnbase "MSnbase" proxy "proxy" QuasR "QuasR" R.filesets "R.filesets" Rbowtie "Rbowtie" rhdf5 "rhdf5" Rmpi "Rmpi" rtracklayer "rtracklayer" S4Vectors "S4Vectors" selectr "selectr" LibPath ade4 "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" arules "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" BiocCheck "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" BiocInstaller "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" biocViews "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" biovizBase "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" bsseq "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" DelayedArray "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" future "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" GenomicScores "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" grpreg "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" haplo.stats "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" HDF5Array "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" MafDb.gnomAD.r2.0.1.hs37d5 "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" mouse.db0 "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" MSnbase "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" proxy "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" QuasR "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" R.filesets "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" Rbowtie "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" rhdf5 "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" Rmpi "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" rtracklayer "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" S4Vectors "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" selectr "/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library" Installed Built ReposVer ade4 "1.7-10" "3.5.0" "1.7-11" arules "1.6-0" "3.5.0" "1.6-1" BiocCheck "1.15.8" "3.5.0" "1.15.9" BiocInstaller "1.29.5" "3.5.0" "1.29.6" biocViews "1.47.3" "3.5.0" "1.47.6" biovizBase "1.27.1" "3.5.0" "1.27.2" bsseq "1.15.1" "3.5.0" "1.15.2" DelayedArray "0.5.23" "3.5.0" "0.5.30" future "1.7.0" "3.5.0" "1.8.0" GenomicScores "1.3.21" "3.5.0" "1.3.24" grpreg "3.1-2" "3.5.0" "3.1-3" haplo.stats "1.7.7" "3.5.0" "1.7.9" HDF5Array "1.7.9" "3.5.0" "1.7.10" MafDb.gnomAD.r2.0.1.hs37d5 "3.6.0" "3.5.0" "3.7.0" mouse.db0 "3.5.0" "3.5.0" "3.6.0" MSnbase "2.5.9" "3.5.0" "2.5.11" proxy "0.4-21" "3.5.0" "0.4-22" QuasR "1.19.3" "3.5.0" "1.19.4" R.filesets "2.12.0" "3.5.0" "2.12.1" Rbowtie "1.19.2" "3.5.0" "1.19.3" rhdf5 "2.23.5" "3.5.0" "2.23.7" Rmpi "0.6-6" "3.5.0" "0.6-7" rtracklayer "1.39.9" "3.5.0" "1.39.10" S4Vectors "0.17.39" "3.5.0" "0.17.41" selectr "0.4-0" "3.5.0" "0.4-1" Repository ade4 "https://cran.rstudio.com/src/contrib" arules "https://cran.rstudio.com/src/contrib" BiocCheck "https://bioconductor.org/packages/3.7/bioc/src/contrib" BiocInstaller "https://bioconductor.org/packages/3.7/bioc/src/contrib" biocViews "https://bioconductor.org/packages/3.7/bioc/src/contrib" biovizBase "https://bioconductor.org/packages/3.7/bioc/src/contrib" bsseq "https://bioconductor.org/packages/3.7/bioc/src/contrib" DelayedArray "https://bioconductor.org/packages/3.7/bioc/src/contrib" future "https://cran.rstudio.com/src/contrib" GenomicScores "https://bioconductor.org/packages/3.7/bioc/src/contrib" grpreg "https://cran.rstudio.com/src/contrib" haplo.stats "https://cran.rstudio.com/src/contrib" HDF5Array "https://bioconductor.org/packages/3.7/bioc/src/contrib" MafDb.gnomAD.r2.0.1.hs37d5 "https://bioconductor.org/packages/3.7/data/annotation/src/contrib" mouse.db0 "https://bioconductor.org/packages/3.7/data/annotation/src/contrib" MSnbase "https://bioconductor.org/packages/3.7/bioc/src/contrib" proxy "https://cran.rstudio.com/src/contrib" QuasR "https://bioconductor.org/packages/3.7/bioc/src/contrib" R.filesets "https://cran.rstudio.com/src/contrib" Rbowtie "https://bioconductor.org/packages/3.7/bioc/src/contrib" rhdf5 "https://bioconductor.org/packages/3.7/bioc/src/contrib" Rmpi "https://cran.rstudio.com/src/contrib" rtracklayer "https://bioconductor.org/packages/3.7/bioc/src/contrib" S4Vectors "https://bioconductor.org/packages/3.7/bioc/src/contrib" selectr "https://cran.rstudio.com/src/contrib" update with biocLite() Error: 25 package(s) out of date In addition: Warning message: libraries cannot be written to '/jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-devel/R/devel/lib64/R/site-library' ```
CathG commented 6 years ago

Hi, sorry for the late reply. Re the comment about mixing-matching versions, I plaid guilty: I wanted update version of bsseq (and thus had to update DelayedMatrix) but didn't want to run on the devel version of R, hence I couldn't get the proper version of Bioconductor. I'll wait for the patch to upgrade everything and, for now, I'll downgrade what has to, in order to have only matching versions. Thank you very much

PeteHaitch commented 6 years ago

The root of the problem is https://github.com/Bioconductor/DelayedArray/issues/16

@CathG I'll provide a fix to your issue in bsseq

PeteHaitch commented 6 years ago

@CathG This should now be fixed (https://github.com/hansenlab/bsseq/pull/66). At least, the example in https://github.com/hansenlab/bsseq/issues/65#issuecomment-380501886 now succeeds .

Would you mind testing this on your own machine? The new version should be available in the next few days using BiocInstaller::biocLite("bsseq") (devel branch only). Alternatively, you can install from github using devtools::install_github("hansenlab/bsseq")

CathG commented 6 years ago

@PeteHaitch Thanks for the fixes. I tried to install the fixed package on my computer (with my "old" release of R) but I had a problem with S4Vectors. I tried to only apply your patches (redifining all functions directly in my session) but got an "infinite recursion" problem... I installed R new release but had problem with some packages, first one being data.table, which gave "non-zero exit status" when I tried to install it from source (as it is not yet available for new release)... I'm kind of out of idea for now :-/ so as data.table for R3.5 is supposed to be available on CRAN very soon so I'll probably wait until then (next week?) to do more tests.

PeteHaitch commented 6 years ago

Thanks for trying, @CathG. It can be frustrating getting the environment set up, especially during this change over period.

I did see that there are some issues with data.table with R 3.5 on Windows. So it seems we might have to wait a few days until that is fixed before we can finally resolve this problem.

Thanks for your patience!

CathG commented 6 years ago

@PeteHaitch Hi! data.table is not a problem anymore but I cannot seem to make HDF5Array work. It seems to install correctly but fail to load. I only get the following error:

Error : .onLoad failed in loadNamespace() for 'HDF5Array', details: call: H5Fcreate(file) error: HDF5. File accessibilty. Unable to open file.

Do you have an idea how I can solve that (I didn't find anything informative on the web)? Thank you very much

PeteHaitch commented 6 years ago

Can you isolate the issue to a minimal example? Please include the output of BiocInstaller::biocValid()

CathG commented 6 years ago

the output of biocInstaller::biocValid:

sessionInfo()

R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 >LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] DelayedArray_0.6.0 httr_1.3.1 rhdf5_2.24.0 BiocParallel_1.13.1 IRanges_2.14.1 >S4Vectors_0.18.1 BiocGenerics_0.26.0 matrixStats_0.53.1 BiocInstaller_1.30.0

loaded via a namespace (and not attached): [1] withr_2.1.2 digest_0.6.15 R6_2.2.2 git2r_0.21.0 curl_3.2 devtools_1.13.5 Rhdf5lib_1.2.0 tools_3.5.0 compiler_3.5.0 memoise_1.1.0

Library path directories: C:/Program Files/R/R-3.5.0/library

Out-of-date packages Package LibPath Installed Built ReposVer Repository
BiocParallel "BiocParallel" "C:/Program Files/R/R-3.5.0/library" "1.13.1" "3.5.0" "1.14.0" "https://bioconductor.org/packages/3.7/bioc/src/contrib"

update with biocLite()

Erreur : 1 package(s) out of date )

I did try to update with biocLite() but it didn't change the status.

I just try to install bsseq with devtools::install_github("hansenlab/bsseq"), or even just try to load HDF5Array with library but I get the error (I tried to translate it from French to English, I hope the messages are correct):

library(HDF5Array)

Error : package or namespace load failed for ‘HDF5Array’: .onLoad failed in loadNamespace() for 'HDF5Array', details : call : H5Fcreate(file) error : HDF5. File accessibilty. Unable to open file.

PeteHaitch commented 6 years ago

Please try:

# Removing these is probably overkill but can sometimes fix weird bugs
remove.packages(c("DelayedArray", "HDF5Array"))
# Install all packages using biocLite() 
# NOTE: updating BiocParallel as recommended by biocValid())
BiocInstaller::biocLite(c("DelayedArray", "HDF5Array", "BiocParallel", "bsseq"))
library(HDF5Array)
CathG commented 6 years ago

I get the same error :-( BiocParallel does not seem to install correctly as I still get the following message:

Old packages: 'BiocParallel', 'MASS', 'survival'

Could it be related to the error I'm getting?

PeteHaitch commented 6 years ago

Bugger :( Can you please try this:

library(BiocInstaller)
# Removing these is probably overkill but can sometimes fix weird bugs
remove.packages(c("DelayedArray", "HDF5Array", "bsseq", "BiocParallel", "rhdf5", "Rhdf5lib"))
# Install all packages using biocLite() 
biocLite(c("DelayedArray", "HDF5Array", "BiocParallel", "bsseq"))
library(HDF5Array)

and then share the full output of biocValid()?

If that still doesn't work, might I suggest we move this to http://support.bioconductor.org/? I think we need to first fix your Bioconductor installation before we can return to the issue with bsseq:: collapseBSseq(). There are more people there who might be able to help fix your installation.

CathG commented 6 years ago

still does not work :-(, output of biocValid(): biocValid()

* sessionInfo()

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.24.0         DelayedArray_0.6.0   BiocParallel_1.13.1  IRanges_2.14.1       S4Vectors_0.18.1     BiocGenerics_0.26.0 
[7] matrixStats_0.53.1   BiocInstaller_1.30.0

loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0    Rhdf5lib_1.2.0

Library path directories:
  C:/Program Files/R/R-3.5.0/library 

* Out-of-date packages
             Package        LibPath                              Installed Built   ReposVer
BiocParallel "BiocParallel" "C:/Program Files/R/R-3.5.0/library" "1.13.1"  "3.5.0" "1.14.0"
             Repository                                              
BiocParallel "https://bioconductor.org/packages/3.7/bioc/src/contrib"

update with biocLite()

Erreur : 1 package(s) out of date

I'll ask on bioconductor, hopefully it can be fixed quickly and I'll soon post that everything works with your patches and this issue can be closed. Thanks.

PeteHaitch commented 6 years ago

Ok, thanks. Sorry this has been so frustrating!

CathG commented 6 years ago

It kind of is indeed. At least I have managed to get a TRUE output for biocValid() now, which is promising...

CathG commented 6 years ago

update: BiocParallel is not a problem anymore (the issue was with Rtools not being in the path) but the problem remains with HDF5Array

PeteHaitch commented 6 years ago

Note to self: issue with HDF5Array is being discussed at https://support.bioconductor.org/p/108548/

PeteHaitch commented 6 years ago

Any update on this @CathG? I see it looks like you're still trying to resolve your issue with HDF5Array.

CathG commented 6 years ago

Hi @PeteHaitch I'm indeed still stuck at the HDF5Array step, nothing seems to work :-(

PeteHaitch commented 6 years ago

Have you tried the nuclear option of recording names of installed packages, deleting your package directory(s), and reinstalling all using BiocInstaller::biocLite()?

CathG commented 6 years ago

@PeteHaitch I did but it didn't change the error

PeteHaitch commented 6 years ago

FWIW (and it won't happen immediately) I'm looking to make HDF5Array a suggested rather than required dependency.

CathG commented 6 years ago

@PeteHaitch though HDF5Array issue is not totally resolved,(problem resolved with the new version of rhdf5) I was able to load the package and try your fix and now it works fine. Thanks :-)