ome / ome-zarr-py

Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://pypi.org/project/ome-zarr
Other
156 stars 54 forks source link

Dask arrays for HCS nodes do not `compute` to NumPy arrays #297

Closed ziw-liu closed 1 year ago

ziw-liu commented 1 year ago

The Dask arrays returned by the reader does not compute the underlying chunk type, which should be NumPy arrays. The FOV/image level behaves as normal.

This is likely a bug since Dask documentation claims:

This turns a lazy Dask collection into its in-memory equivalent. For example a Dask array turns into a NumPy array and a Dask dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.

To reproduce:

import dask.array as da
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader

location = parse_url("hcs.ome.zarr")
reader = Reader(location)

data = next(reader()).data[0]
# this passes
assert isinstance(data, da.Array), type(data)

result = data.compute()
# this fails
assert not isinstance(result, da.Array), type(result)
will-moore commented 1 year ago

Thanks for the bug report and code sample. I tracked this down to a bug that is fixed in #299. Can you pip install from that branch and give it a test?

If you want to work without that fix (using the released ome-zarr-py) you'll find that you can get the numpy array simply by calling compute twice:

result = data.compute().compute()
ziw-liu commented 1 year ago

Thanks for the prompt investigation and fix! I will test the PR in a bit.

If you want to work without that fix (using the released ome-zarr-py) you'll find that you can get the numpy array simply by calling compute twice:

This is how I put a stop gap in our application and it does work. Ideally the next release will include the proper fix so that can be reverted.