Open allisonvuong opened 5 years ago
Thanks for the suggestion. I'll look into it, certainly seems like it would be useful. I haven't looked, but does https://github.com/Bioconductor/HDF5Array provide any support for writing sparce matrices?
Yes, I think so. HDF5Array::writeHDF5Array() seems to call BLOCK_write_to_sink in DelayedArray which looks to be performing iterative conversion. See: here
Hi Allison,
This is already implemented in DropletUtils:
.write_hdf5 <- function(path, genome, x, barcodes, gene.id, gene.symbol, gene.type, version="3") { h5createFile(path)
if (version=="3") {
group <- "matrix"
} else {
group <- genome
}
h5createGroup(path, group)
h5write(barcodes, file=path, name=paste0(group, "/barcodes"))
# Saving feature information.
if (version=="3") {
h5createGroup(path, file.path(group, "features"))
h5write(gene.id, file=path, name=paste0(group, "/features/id"))
h5write(gene.symbol, file=path, name=paste0(group, "/features/name"))
h5write(rep(gene.type, length.out=length(gene.id)),
file=path, name=paste0(group, "/features/feature_type"))
h5write(rep(genome, length.out=length(gene.id)),
file=path, name=paste0(group, "/features/genome"))
} else {
h5write(gene.id, file=path, name=paste0(group, "/genes"))
h5write(gene.symbol, file=path, name=paste0(group, "/gene_names"))
}
# Saving matrix information.
x <- as(x, "dgCMatrix")
h5write(x@x, file=path, name=paste0(group, "/data"))
h5write(dim(x), file=path, name=paste0(group, "/shape"))
h5write(x@i, file=path, name=paste0(group, "/indices")) # already zero-indexed.
h5write(x@p, file=path, name=paste0(group, "/indptr"))
return(NULL)
}
-Mat
Hi,
Many Bioconductor packages store single-cell RNASeq data in sparse matrices in-memory. It seems like currently, rhdf5::h5writeDataset does not support a sparse matrix as input. For smaller matrices, I can simply coerce my dgCMatrix into a normal Matrix as pass this to h5write, but for extremely large matrices, I cannot because I run out of memory. Thus instead, I am bringing a subset of the sparse matrix into memory, coercing it into a normal matrix, and then calling h5writeDataset on a hyperslab.
Is it possible to support sparse matrices as input?
Best, Allison