Open joshua-gould opened 5 years ago
+1, that's pretty much my current plan.
Any update of support for backed mode using zarr? Thanks.
Is this still an active direction?
@ilan-gold I saw some recent PRs involving backed Zarr objects, is there any updated on this? Thanks 😊
Support for backed sparse zarr arrays is in anndata 0.10. You can use them like:
import scanpy as sc, anndata as ad
import zarr
adata = sc.datasets.pbmc3k_processed().raw.to_adata()
adata.write_zarr("pbmc3k.zarr")
g = zarr.open("pbmc3k.zarr")
def read_backed(group):
return ad.AnnData(
ad.experimental.sparse_dataset(group["X"]),
**{
k: ad.experimental.read_elem(group[k]) if k in group else {}
for k in ["layers", "obs", "var", "obsm", "varm", "uns", "obsp", "varp"]
}
)
adata = read_backed(g)
adata.X
# CSRDataset: backend zarr, shape (2638, 13714), data_dtype float32
You can see more example usage in this gist: https://gist.github.com/ivirshup/c29c9fb0b5b21a9c290cf621e4e68b18
It's not quite the same as "backed mode", but is actually better since this backed sparse array can be any matrix in the anndata object. Not just X
.
There are some performance issues at the moment, see:
Fantastic! Thanks for the update! Currently we have some performance bottlenecks in spatialdata
when large tables are fully loaded into memory. We can experiment with this, it should work 😊
There is an implementation for zarr, but backed_mode is not supported. Since the APIs for h5py and zarr are almost identical, I think it makes sense to abstract much of the h5 functionality into base classes in which a few methods are implemented by h5 subclasses and zarr subclasses and can easily be extended to future implementations that implement APIs similar to h5py and zarr.
On Fri, Sep 13, 2019 at 4:35 AM Philipp A. notifications@github.com wrote: