Closed tkisss closed 3 years ago
Hi,
You can either use rhdf5 (https://bioconductor.org/packages/release/bioc/html/rhdf5.html) or hdf5r (https://cran.r-project.org/web/packages/hdf5r/vignettes/hdf5r.html) packages for the conversion. Here is an example using rhdf5:
library(rhdf5)
library(Matrix)
# load dense data matrices (to be integrated)
data1 = readRDS("data1.RDS") # dataset 1
data2 = readRDS("data2.RDS") # dataset 2
# convert them to sparse data matrices
data1 = as(data1, "CsparseMatrix")
data2 = as(data2, "CsparseMatrix")
# file names for the h5 files to be generated
fname1 = "data1.h5"
fname2 = "data2.h5"
hdf5_files = list(fname1,fname2)
dgCmat_list = c("data1", "data2") # names of sparse data matrices
# Initialize HDF5 files
h5closeAll()
for (i in 1:length(hdf5_files)){
h5createFile(hdf5_files[[i]])
h5createGroup(hdf5_files[[i]], "matrix")
h5write(get(dgCmat_list[i])@Dimnames[[2]], file=hdf5_files[[i]], name="matrix/barcodes") # cell barcodes
h5createGroup(hdf5_files[[i]], file.path("matrix", "features"))
h5write(get(dgCmat_list[i])@Dimnames[[1]], file=hdf5_files[[i]], name="matrix/features/name")
h5write(dim(get(dgCmat_list[i])), file=hdf5_files[[i]], name="matrix/shape")
# save x, i, p into h5 file
h5createDataset(hdf5_files[[i]],"matrix/data",dims=length(get(dgCmat_list[i])@x),storage.mode="double", chunk = 1000)
h5write(get(dgCmat_list[i])@x, file=hdf5_files[[i]], name="matrix/data", index = list(1:length(get(dgCmat_list[i])@x)))
h5createDataset(hdf5_files[[i]],"matrix/indices",dims=length(get(dgCmat_list[i])@i),storage.mode="integer", chunk = 1000)
h5write(get(dgCmat_list[i])@i, file=hdf5_files[[i]], name="matrix/indices",index = list(1:length(get(dgCmat_list[i])@i))) # already zero-indexed.
h5createDataset(hdf5_files[[i]],"matrix/indptr",dims=length(get(dgCmat_list[i])@p),storage.mode="integer", chunk = 1000)
h5write(get(dgCmat_list[i])@p, file=hdf5_files[[i]], name="matrix/indptr",index = list(1:length(get(dgCmat_list[i])@p)))
h5closeAll()
}
Hi,
Thanks for your help, and I successfully converted the genes X cells matrix to a .h5 file according to your suggestion, but I did not find a reduction of memory usage. Online iNMF took about 70GB even I input a .h5 file.
My code:
data = list(batch1=paste0(input_path, '/batch1.h5', sep=''), batch2=paste0(input_path, '/batch2.h5', sep=''))
adata = createLiger(data,remove.missing=FALSE)
adata = normalize(adata,remove.missing = FALSE)
adata = selectGenes(adata, var.thresh = 0.2, do.plot = F)
adata = scaleNotCenter(adata)
Rprof(tf <- paste(output_path, "/rprof.log",sep=''), memory.profiling=TRUE)
adata = online_iNMF(adata, k = 20, miniBatch_size = 5000, max.epochs = 5)
Rprof(NULL)
adata = quantile_norm(adata)
I used Rprof to record memory usage in R and used max(summaryRprof(tf, memory="both")$by.total$mem.total) to get the maximum memory.
I noticed the peak memory usage (Fig 4.) in your paper Iterative Renement of Cellular Identity from Single-Cell Data Using Online Learning from bioRxiv. My test results are very different from yours. I don't know where the problem is, can you give me some suggestions?
Thanks! Kang Tian
Hello Kang,
We used peakRAM package for tracking the peak RAM consumed.
mem <- peakRAM({
adata = online_iNMF(adata, k = 20, miniBatch_size = 5000, max.epochs = 5)
})
peak_mem = mem$Peak_RAM_Used_MiB
For better accuracy, you could restart the PC or start a new session on a computer cluster for this process. As for the Rprof approach, here is a discussion related to the interpretation of the output (https://stackoverflow.com/questions/58250126/interpretation-of-memory-profiling-output-of-rprof).
Hi,
I noticed that online iNMF uses about 70GB on a 100000x20000 raw cells x genes matrix. How can I convert a genes X cells matrix to a .h5 file as input of createLiger for a memory efficient implementation?
Thanks!