scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
571 stars 152 forks source link

allow backed mode for zarr format #219

Open joshua-gould opened 5 years ago

joshua-gould commented 5 years ago

There is an implementation for zarr, but backed_mode is not supported. Since the APIs for h5py and zarr are almost identical, I think it makes sense to abstract much of the h5 functionality into base classes in which a few methods are implemented by h5 subclasses and zarr subclasses and can easily be extended to future implementations that implement APIs similar to h5py and zarr.

On Fri, Sep 13, 2019 at 4:35 AM Philipp A. notifications@github.com wrote:

We do:

https://github.com/theislab/anndata/blob/92b6791c05b1e2f54dac5d6090ecc991ff6a50b4/anndata/core/anndata.py#L48-L58

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/theislab/anndata/issues/219?email_source=notifications&email_token=ABH6THZGCPAIFDOH6MRS443QJNGGBA5CNFSM4IWGBAL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6ULUCQ#issuecomment-531151370, or mute the thread https://github.com/notifications/unsubscribe-auth/ABH6TH3LMUOIIVL6HUYZFLLQJNGGBANCNFSM4IWGBALQ .

ivirshup commented 5 years ago

+1, that's pretty much my current plan.

joshua-gould commented 3 years ago

Any update of support for backed mode using zarr? Thanks.

tmchartrand commented 1 year ago

Is this still an active direction?

LucaMarconato commented 8 months ago

@ilan-gold I saw some recent PRs involving backed Zarr objects, is there any updated on this? Thanks 😊

ivirshup commented 8 months ago

Support for backed sparse zarr arrays is in anndata 0.10. You can use them like:

import scanpy as sc, anndata as ad
import zarr

adata = sc.datasets.pbmc3k_processed().raw.to_adata()
adata.write_zarr("pbmc3k.zarr")

g = zarr.open("pbmc3k.zarr")

def read_backed(group):
    return ad.AnnData(
        ad.experimental.sparse_dataset(group["X"]),
        **{
            k: ad.experimental.read_elem(group[k]) if k in group else {}
            for k in ["layers", "obs", "var", "obsm", "varm", "uns", "obsp", "varp"]
        }
    )

adata = read_backed(g)
adata.X
# CSRDataset: backend zarr, shape (2638, 13714), data_dtype float32

You can see more example usage in this gist: https://gist.github.com/ivirshup/c29c9fb0b5b21a9c290cf621e4e68b18

It's not quite the same as "backed mode", but is actually better since this backed sparse array can be any matrix in the anndata object. Not just X.

ivirshup commented 8 months ago

There are some performance issues at the moment, see:

LucaMarconato commented 8 months ago

Fantastic! Thanks for the update! Currently we have some performance bottlenecks in spatialdata when large tables are fully loaded into memory. We can experiment with this, it should work 😊