theislab / zellkonverter

Conversion between scRNA-seq objects
https://theislab.github.io/zellkonverter/
Other
146 stars 27 forks source link

Using HDF5Array when reading from H5AD? #4

Closed LTLA closed 4 years ago

LTLA commented 4 years ago

Does H5AD store the matrices as HDF5 datasets? If so, we could leverage the HDF5Array to avoid actually reading anything from disk during AnnData2SCE.

LTLA commented 4 years ago

This is possible:

library(zellkonverter)
example(writeH5AD)

library(HDF5Array)
HDF5Array(temp, "X")
## <20006 x 3005> matrix of class HDF5Matrix and type "double":
##             [,1]    [,2]    [,3] ... [,3004] [,3005]
##     [1,]       0       0       0   .       0       1
##     [2,]       3       1       0   .       0       1
##     [3,]       3       1       6   .       0       0
##     [4,]       0       0       0   .       0       0
##     [5,]       1       1       1   .       0       0
##      ...       .       .       .   .       .       .
## [20002,]      65      97      64   .      29     120
## [20003,]      75     111      95   .      20     143
## [20004,]     158     326     209   .      36     359
## [20005,]      31      88      97   .      12      52
## [20006,]      13      14       9   .       3      13

If we wanted this capability, we would need to add an option to readH5AD to represent all layers as HDF5Arrays rather than as sparse matrices. We would also need to add a flag to AnnData2SCE to avoid loading all the layers so that the later layers can be represented in a super-lightweight fashion. (Ideally we would avoid loading any of the layers, but I don't think you could do without X).