Closed milanmlft closed 3 years ago
This thing isn't a "HDF5 dataset" in the standard sense. It's an AnnData-custom structure, akin to how 10x stores their sparse matrices in their HDF5 files (but transposed). I suppose we could handle it via HDF5Array::TENxMatrix
and transpose the output, assuming we can pull the relevant paths out.
Workaround is just to use use_hdf5=FALSE
for the time being.
Alright, thanks for the feedback!
It would be useful to have that support I think because it seems that AnnData encourages the use of this custom format in their write
documentation.
Workaround is just to use
use_hdf5=FALSE
for the time being.
I have 2 1 problems with this:
use_hdf5=FALSE
, I actually get the exact same warning and end up with an empty sparse X
matrix. I guess because .extract_or_skip_assay()
is still running into the same problem of trying to transpose a non-supported object?In-memory sparse reading works fine for me:
library(scRNAseq)
out <- ZeiselBrainData()
counts(out) <- as(counts(out), "dgCMatrix")
library(zellkonverter)
writeH5AD(out, file="stuff.h5")
out <- readH5AD("stuff.h5") # defaults to use_hdf5=FALSE
library(SummarizedExperiment)
class(assay(out))
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"
Matrix::nnzero(assay(out))
## [1] 11349080
Also 1.1.3.
OK, I see that I made a stupid typo in my example 🤦♂️ I used krumsiek
twice instead of using krumsiek2
in the second case. It does indeed work fine when loading in-memory. I'll edit my comment to remove that example.
Sorry for the confusion!
But I guess my first issue still stands.
This can now be done, thanks to Bioconductor/HDF5Array#34. To illustrate:
library(scRNAseq)
out <- ZeiselBrainData()
counts(out) <- as(counts(out), "dgCMatrix")
library(zellkonverter)
writeH5AD(out, file="stuff.h5")
out <- H5ADMatrix("stuff.h5", "X")
out
## <20006 x 3005> sparse matrix of class H5ADMatrix and type "double":
## [,1] [,2] [,3] ... [,3004] [,3005]
## [1,] 0 0 0 . 0 1
## [2,] 3 1 0 . 0 1
## [3,] 3 1 6 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 1 1 1 . 0 0
## ... . . . . . .
## [20002,] 65 97 64 . 29 120
## [20003,] 75 111 95 . 20 143
## [20004,] 158 326 209 . 36 359
## [20005,] 31 88 97 . 12 52
## [20006,] 13 14 9 . 3 13
Support should be as easy as slapping in:
} else if (hdf5_backed && any(grepl("^anndata\\..*\\.SparseDataset", class(mat)))) {
# It's H5AD's custom format, so let's handle it with the H5ADMatrix object.
file <- as.character(mat$file$id$name)
name <- as.character(mat$name)
mat <- HDF5Array::H5ADMatrix(file, name)
} else {
... inside .extract_or_skip_assay()
.
Nice! Added to the list.
Support should be as easy as slapping in:
} else if (hdf5_backed && any(grepl("^anndata\\..*\\.SparseDataset", class(mat)))) { # It's H5AD's custom format, so let's handle it with the H5ADMatrix object. file <- as.character(mat$file$id$name) name <- as.character(mat$name) mat <- HDF5Array::H5ADMatrix(file, name) } else {
... inside
.extract_or_skip_assay()
.
Just tested this on a 1.3M dataset and it works perfectly. Thanks a lot!
Also no need to make a special case for grepl("^h5py\\..*\\.Dataset", class(mat)))
. HDF5Array::H5ADMatrix()
should work with any .h5ad
file (with a plain or sparse matrix).
I'm having some trouble reading in a dataset for which the
X
matrix was stored asanndata._core.sparse_dataset.SparseDataset
. On the Python side, I can read it as HDF5-backed and everything works fine, but when I try to read it into R withzellkonverter::readH5AD()
I getWhich then results in an empty sparse matrix. After some digging I noticed that it's due to the
anndata._core.sparse_dataset.SparseDataset
not being recognized as HDF5-format (though it is, I think?) and.extract_or_skip_assay()
trying to transpose it manually.I was wondering whether this is intentionally not supported and if so, whether it will be at some point?
I have a workaround where I load the data on the Python side and then write it back out with
anndata.write_h5ad(file, force_dense = True)
but I was wondering whether there is a more elegant solution?If necessary I can provide a small reproducible example.
Thanks a lot!