hansenlab / bsseq

Devel repository for bsseq
36 stars 26 forks source link

HDF5-backed objects saved with a "prefix" fail to smooth #98

Open Nick-Eagles opened 4 years ago

Nick-Eagles commented 4 years ago

Hello,

I frequently make use of the prefix argument to saveHDF5SummarizedExperiment from the HDF5Array package when working with bsseq objects. However, HDF5-backed bsseq objects saved with a prefix seem to not correctly smooth with BSmooth. Below I provide a small reproducible example which generates the issue. If you omit the prefix argument in the below call to saveHDF5SummarizedExperiment, the entire code segment works as expected.

library('bsseq')
library('HDF5Array')
library('sessioninfo')

write_dir = 'hdf5_object'

#  Load the memory-backed example bsseq object
data(BS.chr22)

#  Suppose I wish to manipulate an HDF5-backed version of the object. I
#  provide an arbitrarily named prefix; without supplying the 'prefix'
#  argument, the below smoothing code works fine!
bs_disk = saveHDF5SummarizedExperiment(BS.chr22, dir=write_dir, prefix='CpG',
                                       verbose=TRUE)

##Start writing assay 1/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##Realizing block 1/1 ... OK, writing it ... OK
##Finished writing assay 1/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##
##Start writing assay 2/2 to HDF5 file:
##  hdf5_object/CpGassays.h5
##Realizing block 1/1 ... OK, writing it ... OK
##Finished writing assay 2/2 to HDF5 file:
##  hdf5_object/CpGassays.h5

Serialize BSseq object to RDS file:
  hdf5_object/CpGse.rds

#  Attempt to smooth the HDF5-backed object 'bs_disk'
BSmooth(bs_disk, verbose=TRUE)

##An object of type 'BSseq' with
##  494728 methylation loci
##  2 samples
##has been smoothed with
##  BSmooth (ns = 70, h = 1000, maxGap = 100000000)
##Some assays are HDF5Array-backed
##Warning message:
##In BSmooth(bs_disk, verbose = TRUE) :
##  'BSseq' was not created with either read.bismark() or
##  HDF5Array::saveHDF5SummarizedExperiment(). BSmooth() is using
##  automatically created HDF5 file(s) (see ?HDF5Array::setHDF5DumpFile) to
##  store smoothing result. You will need to run
##  HDF5Array::saveHDF5SummarizedExperiment() on the returned object if you
##  wish to save the returned object.

#  Show that the on-disk object was indeed not smoothed
getMeth(bs_disk, type='smooth')

##Error in getMeth(bs_disk, type = "smooth") :
##  'type=smooth' requires the object to have been smoothed.

session_info()

##─ Session info ───────────────────────────────────────────────────────────────
## setting  value
## version  R version 4.0.2 Patched (2020-06-24 r78746)
## os       CentOS Linux 7 (Core)
## system   x86_64, linux-gnu
## ui       X11
## language (EN)
## collate  en_US.UTF-8
## ctype    en_US.UTF-8
## tz       US/Eastern
## date     2020-10-12
##
##─ Packages ───────────────────────────────────────────────────────────────────
## package              * version  date       lib source
## assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
## Biobase              * 2.48.0   2020-04-27 [2] Bioconductor
## BiocGenerics         * 0.34.0   2020-04-27 [2] Bioconductor
## BiocParallel           1.22.0   2020-04-27 [2] Bioconductor
## Biostrings             2.56.0   2020-04-27 [2] Bioconductor
## bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
## BSgenome               1.56.0   2020-04-27 [2] Bioconductor
## bsseq                * 1.24.0   2020-04-28 [2] Bioconductor
## cli                    2.0.2    2020-02-28 [2] CRAN (R 4.0.0)
## colorspace             1.4-1    2019-03-18 [2] CRAN (R 4.0.0)
## crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
## data.table             1.12.8   2019-12-09 [2] CRAN (R 4.0.0)
## DelayedArray         * 0.14.0   2020-04-27 [2] Bioconductor
## DelayedMatrixStats     1.10.0   2020-04-27 [2] Bioconductor
## fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
## GenomeInfoDb         * 1.24.0   2020-04-27 [2] Bioconductor
## GenomeInfoDbData       1.2.3    2020-05-18 [2] Bioconductor
## GenomicAlignments      1.24.0   2020-04-27 [2] Bioconductor
## GenomicRanges        * 1.40.0   2020-04-27 [2] Bioconductor
## glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
## gtools                 3.8.2    2020-03-31 [2] CRAN (R 4.0.0)
## HDF5Array            * 1.16.0   2020-04-27 [2] Bioconductor
## IRanges              * 2.22.1   2020-04-28 [2] Bioconductor
## lattice                0.20-41  2020-04-02 [3] CRAN (R 4.0.2)
## lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
## limma                  3.44.1   2020-04-28 [2] Bioconductor
## locfit                 1.5-9.4  2020-03-25 [2] CRAN (R 4.0.0)
## Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.0.2)
## matrixStats          * 0.56.0   2020-03-13 [2] CRAN (R 4.0.0)
## munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.0)
## permute                0.9-5    2019-03-12 [2] CRAN (R 4.0.0)
## R.methodsS3            1.8.0    2020-02-14 [2] CRAN (R 4.0.0)
## R.oo                   1.23.0   2019-11-03 [2] CRAN (R 4.0.0)
## R.utils                2.9.2    2019-12-08 [2] CRAN (R 4.0.0)
## R6                     2.4.1    2019-11-12 [2] CRAN (R 4.0.0)
## Rcpp                   1.0.4.6  2020-04-09 [2] CRAN (R 4.0.0)
## RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
## rhdf5                * 2.32.0   2020-04-27 [2] Bioconductor
## Rhdf5lib               1.10.0   2020-04-27 [2] Bioconductor
## rlang                  0.4.7    2020-07-09 [1] CRAN (R 4.0.2)
## Rsamtools              2.4.0    2020-04-27 [2] Bioconductor
## rtracklayer            1.48.0   2020-04-27 [2] Bioconductor
## S4Vectors            * 0.26.1   2020-05-16 [2] Bioconductor
## scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.0)
## sessioninfo          * 1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
## SummarizedExperiment * 1.18.1   2020-04-30 [2] Bioconductor
## withr                  2.2.0    2020-04-20 [2] CRAN (R 4.0.0)
## XML                    3.99-0.3 2020-01-20 [2] CRAN (R 4.0.0)
## XVector                0.28.0   2020-04-27 [2] Bioconductor
## zlibbioc               1.34.0   2020-04-27 [2] Bioconductor
##
##[1] /users/neagles/R/4.0
##[2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
##[3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library
PeteHaitch commented 4 years ago

Thanks for the report. I have a suspicion of the cause and will try to fix before the next release (but no promises as my next week or so is pretty packed ...)

PeteHaitch commented 4 years ago

Afraid I haven't found time for this yet and with the next release so close it'll have to wait until the next cycle, sorry.