Open will-moore opened 1 year ago
@will-moore Is the store consolidated? I get a 200
on https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/.zgroup but not on https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/.zmetadata
If not, I should probably make that more obvious as an essential step in the tutorial. @ivirshup That is actually something I've been meaning to mention - we should be able to resolve as @will-moore does the keys of obsm
etc. without consolidating metadata at a slight penalty. Should we implement this?
Ah, right - that's coming back to the discussion at https://github.com/kevinyamauchi/ome-ngff-tables-prototype/pull/12 where we preferred consolidated metadata not to be part of the NGFF spec. cc @joshmoore
I mean, y'all are big users and the feature probably makes sense anyway so we probably should implement it. But the penalty is of course a bit higher since you need to open each .zarray
for each column.
By "implement it" you mean support for "keys"?
Would this need to happen in anndata
itself, or is there a way to add it in the example above?
This is code I'm adding to https://github.com/ome/ome-zarr-py/pull/256 so all NGFF python users will be able to use it.
I am not sure @will-moore but "implement it" means using the trick you use in the browser to discover columns, whether that's in this example or in general. At the moment, we iterate over the keys of the incoming obsm
object (a dict
) to read it and consolidated metadata lets you have those keys.
I seem to have got the loading of obsm
working, using the custom keys
(see below), but obsp
is beyond what I can get my head around just now...
I don't know if I'm getting close, or if this is really a lot more complex and we should resign ourselves to using consolidate_metadata if we want this remote access to work?
from typing import Callable, Union
import json
import dask.array as da
import zarr
from anndata import AnnData
from anndata._io.specs import IOSpec
from anndata.compat import H5Array, H5Group, ZarrArray, ZarrGroup
# ** requires anndata==0.9.0.rc1
from anndata.experimental import read_dispatched, read_elem
from zarr.storage import FSStore
from ome_zarr.io import parse_url
StorageType = Union[H5Array, H5Group, ZarrArray, ZarrGroup]
# From https://anndata.readthedocs.io/en/latest/tutorials/notebooks/%7Bread,write%7D_dispatched.html
def read_remote_anndata(store: FSStore, name: str) -> AnnData:
table_group = zarr.group(store=store, path=name)
def callback(
func: Callable, elem_name: str, elem: StorageType, iospec: IOSpec
) -> AnnData:
print("el", elem_name, iospec.encoding_type, elem)
if iospec.encoding_type in (
"dataframe",
"csr_matrix",
"csc_matrix",
"awkward-array",
):
# Preventing recursing inside of these types
return read_elem(elem)
elif iospec.encoding_type == "array":
return da.from_zarr(elem)
elif iospec.encoding_type == "dict" and "obsm" in elem_name: # or "obsp" in elem_name:
# load .zattrs
attrs = store.get(elem_name + "/.zattrs")
attrs = json.loads(attrs)
print("attrs", attrs)
if "keys" in attrs:
to_return = {}
for key in attrs["keys"]:
print("URL", elem_name + "/" + key)
try:
arr = da.from_zarr(store, elem_name + "/" + key)
to_return[key] = arr
except:
print("Failed " + elem_name + "/" + key)
pass
return to_return
# not handled above, call func()
rsp = func(elem)
return rsp
adata = read_dispatched(table_group, callback=callback)
return adata
url = "https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/"
name = "regions_table"
store = parse_url(url, mode="r").store
anndata_obj = read_remote_anndata(store, name)
print('anndata_obj', anndata_obj)
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!
@ivirshup what do you think about this?
@ivirshup That is actually something I've been meaning to mention - we should be able to resolve as @will-moore does the keys of obsm etc. without consolidating metadata at a slight penalty. Should we implement this?
@flying-sheep I am doing this in the upcoming PR already #947
Thanks all for following up on this 👍
Should we mark that PR as Fixes #951
then?
Ah, good call. I'll add it.
Hi @will-moore as an alternative to https://github.com/scverse/anndata/pull/947 you can also try out https://github.com/scverse/anndata/tree/ig/xarray_compat (or https://github.com/scverse/anndata/pull/1247) which is a bit lighter weight, but similar with a read_backed
method. This returns a normal AnnData object but with xarray.
Thanks for the update @ilan-gold - Unfortunately I'm not looking to include this work in ome-zarr-py
now since the spec proposal isn't going ahead just now https://github.com/ome/ngff/pull/64
Hi,
I'm using @ilan-gold's nice sample code at https://anndata.readthedocs.io/en/latest/tutorials/notebooks/%7Bread,write%7D_dispatched.html to read remote anndata, which is working great when I'm serving data locally via http.
But when I'm using minio to serve the data, I'm not getting the obsp, obsm or uns parts of the AnnData object. See code sample below.
A UI view of the data is at https://deploy-preview-20--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/ (from https://github.com/ome/ome-ngff-validator/pull/20) where I'm using the extra "keys" in e.g. https://minio-dev.openmicroscopy.org/idr/temp_table/test_segment.zarr/tables/regions_table/obsm/.zattrs to load those groups:
Is there any way I can use those 'keys' to load
obsp
andobsm
data?Thanks!