hansenlab / bsseq

Devel repository for bsseq
36 stars 26 forks source link

Reading H5 files directly #124

Open boooooogey opened 1 year ago

boooooogey commented 1 year ago

Can I provide h5 files instead of a list of methylation files to the read.bismark function or another compatible function?

When using HDF5Array as the BACKEND, it saves data as two files: se.rds and assays.h5. I can successfully read se.rds using readRDS after moving assays.h5 to the current path. It would be more convenient if I could directly provide the paths of se.rds and assays.h5.

Is it currently possible to achieve this with the existing code version? If not, would it be challenging to implement?

PeteHaitch commented 1 year ago

Once you have an HDF5-backed BSseq object (i.e. you've run HDF5Array::saveHDF5BackedSummarizedExperiment() and you have the se.rds and assays.h5 files) then you can load it back into R using HDF5Array::loadHDF5SummarizedExperiment(). There's no need for read.bismark() once you're at this point, so I don't really understand what you're trying to do.

sahuno commented 1 year ago

this is good question! thanks @PeteHaitch for the response! As a follow question - if you subset a loaded bsseq object in an R session backed by hdf5 do you need to manually resave on disk before using the modified bsseq object for bsmooth()? here's an example after removing chromosomes Y and MT, where bsmooth() doesn't seem to recognize the modified bsseq object. pls how can i resolve this? thanks!!!

chrMT_loci <- which(bismark_bsseq@rowRanges@seqnames == "MT")
chrY_loci <- which(bismark_bsseq@rowRanges@seqnames == "Y")
chr_loci_rm <- c(chrMT_loci, chrMT_loci)
bismark_bsseq <- bismark_bsseq[-chr_loci_rm,]

message("\n performing bsmoothing \n")
bismark_bsseq.fit <- BSmooth(BSseq = bismark_bsseq,
                            BPPARAM = MulticoreParam(workers = 24,progressbar = TRUE),
                            verbose = TRUE)
PeteHaitch commented 1 year ago

if you subset a loaded bsseq object in an R session backed by hdf5 do you need to manually resave on disk before using the modified bsseq object for bsmooth()?

No, that shouldn't be necessary.

here's an example after removing chromosomes Y and MT, where bsmooth() doesn't seem to recognize the modified bsseq object. pls how can i resolve this?

I don't understand what you mean and the code you've pasted in doesn't show any output. If you're having a problem please post a reproducible example so we can help you.