ome / ome2024-ngff-challenge

Project planning and material repository for the 2024 challenge to generate 1 PB of OME-Zarr data
https://pypi.org/project/ome2024-ngff-challenge/
BSD 3-Clause "New" or "Revised" License
11 stars 8 forks source link

Add new dev2/resave.py with sharding example #3

Closed will-moore closed 2 months ago

will-moore commented 2 months ago

Sample data generated with this updated resave.py can be viewed at https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/v0.5-dev2/6001240.zarr

Also tried sharding https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/v0.5-dev2/6001240_sharded.zarr although this ISN'T working for the loading of chunks with zarrita.js

joshmoore commented 2 months ago

:100:, @will-moore. You ok to just merge this when ready?

imagesc-bot commented 2 months ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome2024-ngff-challenge/97363/17

will-moore commented 2 months ago

Currently trying to convert a tiny 3-image plate that I generated with omero-cli-zarr and it's failing with:

(zarr_v3) Williams-MacBook-Pro:dev2 wmoore$ python resave.py /Users/wmoore/Desktop/ZARR/data/plates/51.zarr/A/1/0.zarr plate_51_well.zarr
Traceback (most recent call last):
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/resave.py", line 120, in <module>
    convert_image(read_root, ns.input_path, ns.output_path)
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/resave.py", line 103, in convert_image
    convert_array(
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/resave.py", line 75, in convert_array
    }).result()
ValueError: FAILED_PRECONDITION: Error opening "zarr3" driver: Mismatch in "codecs": Cannot merge zarr codec constraints [] and [{"configuration":{"clevel":5,"cname":"lz4"},"name":"blosc"}]: Mismatch in number of bytes -> bytes codecs (0 vs 1) [source locations='tensorstore/driver/zarr3/codec/codec_chain_spec.cc:422\ntensorstore/driver/zarr3/codec/codec_chain_spec.cc:468\ntensorstore/driver/zarr3/metadata.cc:527\ntensorstore/driver/zarr3/metadata.cc:527\ntensorstore/driver/zarr3/driver.cc:584\ntensorstore/driver/kvs_backed_chunk_driver.cc:1262\ntensorstore/driver/driver.cc:112'] [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_sync\":true},\"create\":true,\"driver\":\"zarr3\",\"dtype\":\"uint8\",\"kvstore\":{\"driver\":\"file\",\"path\":\"plate_51_well.zarr/0/\"},\"metadata\":{\"chunk_grid\":{\"configuration\":{\"chunk_shape\":[2,1024,1024]},\"name\":\"regular\"},\"chunk_key_encoding\":{\"name\":\"default\"},\"codecs\":[{\"configuration\":{\"clevel\":5,\"cname\":\"lz4\"},\"name\":\"blosc\"}],\"data_type\":\"uint8\",\"dimension_names\":[\"c\",\"y\",\"x\"],\"node_type\":\"array\",\"shape\":[3,1024,1344]},\"transform\":{\"input_exclusive_max\":[[3],[1024],[1344]],\"input_inclusive_min\":[0,0,0],\"input_labels\":[\"c\",\"y\",\"x\"]}}']
will-moore commented 2 months ago

@joshmoore I'm trying to read from remote but write locally. With my last commit above, I now get:

$ python resave.py zarr/v0.4/idr0062A/6001240.zarr 6001240_from_remote.zarr --input-overwrite --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon

Traceback (most recent call last):
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/resave.py", line 167, in <module>
    read_root = zarr.open_group(store=STORES[0], zarr_format=2)
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/api/synchronous.py", line 175, in open_group
    sync(
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/sync.py", line 92, in sync
    raise return_result
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/sync.py", line 51, in _runner
    return await coro
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/api/asynchronous.py", line 523, in open_group
    return await AsyncGroup.open(store_path, zarr_format=zarr_format)
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/group.py", line 152, in open
    zgroup_bytes, zattrs_bytes = await asyncio.gather(
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/store/core.py", line 35, in get
    return await self.store.get(self.path, prototype=prototype, byte_range=byte_range)
  File "/Users/wmoore/Desktop/ZARR_PYTHON/zarr-python/src/zarr/store/remote.py", line 103, in get
    await (
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 1128, in _cat_file
    return await _error_wrapper(_call_and_read, retries=self.retries)
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 145, in _error_wrapper
    raise err
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 1115, in _call_and_read
    resp = await self._call_s3(
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 358, in _call_s3
    await self.set_session()
  File "/Users/wmoore/opt/anaconda3/envs/zarr_v3/lib/python3.10/site-packages/s3fs/core.py", line 519, in set_session
    self.session = aiobotocore.session.AioSession(**self.kwargs)
TypeError: AioSession.__init__() got an unexpected keyword argument '//'
joshmoore commented 2 months ago

@will-moore: can you try updating from the zarr-python v3 branch? I'm not seeing this locally.

will-moore commented 2 months ago

I updated to latest branch...

cd ZARR_PYTHON/zarr-python/
git fetch origin
git checkout origin/v3
pip freeze | grep zarr
-e git+ssh://git@github.com/ome/ome-zarr-py.git@d5b37acd6b7bb246e173b24e48183b7df59e8d61#egg=ome_zarr
-e git+ssh://git@github.com/zarr-developers/zarr-python.git@33b158974a55f1818f27dcc9a3bd2135c51450ff#egg=zarr

but still see the same error.

Also tried...

$ pip freeze | grep s3fs
s3fs==2024.6.0
$ pip install -U s3fs
Successfully installed fsspec-2024.6.1 s3fs-2024.6.1

But still seeing the same result.

joshmoore commented 2 months ago

Hmmm.... and with a fresh conda/mamba environment?

channels:
  - conda-forge
dependencies:
  - 'numpy<2'
  - tensorstore # loads dependencies
  - zarr  # loads dependencies
  - pip
  - pip:
      - "--editable=git+https://github.com/will-moore/ome-zarr-py.git@zarr_v3_support#egg=ome-zarr"
      - "--editable=git+https://github.com/zarr-developers/zarr-python.git@v3#egg=zarr"
      - 'tensorstore>=0.1.63'
will-moore commented 2 months ago
environment.yml ``` name: zarr_python_v3 channels: - conda-forge dependencies: - 'numpy<2' - tensorstore # loads dependencies - zarr # loads dependencies - pip - pip: - "--editable=git+https://github.com/will-moore/ome-zarr-py.git@zarr_v3_support#egg=ome-zarr" - "--editable=git+https://github.com/zarr-developers/zarr-python.git@v3#egg=zarr" - 'tensorstore>=0.1.63' ```
conda env create -f environment.yml
...
Successfully installed MarkupSafe-2.1.5 aiobotocore-2.13.1 aiohttp-3.9.5 aioitertools-0.11.0 aiosignal-1.3.1 attrs-23.2.0 botocore-1.34.131 certifi-2024.7.4 charset-normalizer-3.3.2 click-8.1.7 cloudpickle-3.0.0 crc32c-2.4.1 dask-2024.7.0 distributed-2024.7.0 donfig-0.8.1.post1 frozenlist-1.4.1 fsspec-2024.6.1 idna-3.7 imageio-2.34.2 iniconfig-2.0.0 jinja2-3.1.4 jmespath-1.0.1 lazy-loader-0.4 locket-1.0.0 multidict-6.0.5 networkx-3.3 ome-zarr-0.9.1.dev0 packaging-24.1 partd-1.4.2 pillow-10.4.0 pluggy-1.5.0 psutil-6.0.0 pytest-8.2.2 python-dateutil-2.9.0.post0 pyyaml-6.0.1 requests-2.32.3 s3fs-2024.6.1 scikit-image-0.24.0 scipy-1.14.0 six-1.16.0 sortedcontainers-2.4.0 tblib-3.0.0 tensorstore-0.1.63 tifffile-2024.7.2 toolz-0.12.1 tornado-6.4.1 typing-extensions-4.12.2 urllib3-2.2.2 wrapt-1.16.0 yarl-1.9.4 zarr-3.0.0a1.dev29+g33b1589 zict-3.0.0 zstandard-0.22.0

$ conda activate zarr_python_v3

$ python resave.py zarr/v0.4/idr0062A/6001240.zarr 6001240_from_remote.zarr --input-overwrite --input-bucket=idr --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon
Traceback (most recent call last):
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/resave.py", line 167, in <module>
    read_root = zarr.open_group(store=STORES[0], zarr_format=2)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/api/synchronous.py", line 175, in open_group
    sync(
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/sync.py", line 92, in sync
    raise return_result
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/sync.py", line 51, in _runner
    return await coro
           ^^^^^^^^^^
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/api/asynchronous.py", line 523, in open_group
    return await AsyncGroup.open(store_path, zarr_format=zarr_format)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/group.py", line 152, in open
    zgroup_bytes, zattrs_bytes = await asyncio.gather(
                                 ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/store/core.py", line 35, in get
    return await self.store.get(self.path, prototype=prototype, byte_range=byte_range)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/Desktop/NGFF/ome2024-ngff-challenge/dev2/src/zarr/src/zarr/store/remote.py", line 103, in get
    await (
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 1128, in _cat_file
    return await _error_wrapper(_call_and_read, retries=self.retries)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 145, in _error_wrapper
    raise err
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 113, in _error_wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 1115, in _call_and_read
    resp = await self._call_s3(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 358, in _call_s3
    await self.set_session()
  File "/Users/wmoore/opt/anaconda3/envs/zarr_python_v3/lib/python3.12/site-packages/s3fs/core.py", line 519, in set_session
    self.session = aiobotocore.session.AioSession(**self.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AioSession.__init__() got an unexpected keyword argument '//'
joshmoore commented 2 months ago

So with the state of this PR along with:

I can just make out the data in neuroglancer:

./resave.py 6001240.zarr output.zarr
http-server --cors

Go to https://neuroglancer-demo.appspot.com/#!%7B%22dimensions%22:%7B%22c%22:%5B1%2C%22%22%5D%2C%22z%22:%5B1%2C%22%22%5D%2C%22y%22:%5B1%2C%22%22%5D%2C%22x%22:%5B1%2C%22%22%5D%7D%2C%22position%22:%5B0.5%2C118.5%2C137.5%2C135.5%5D%2C%22crossSectionScale%22:1%2C%22projectionScale%22:512%2C%22layers%22:%5B%7B%22type%22:%22image%22%2C%22source%22:%22zarr3://http://localhost:8080/output.zarr/0%22%2C%22tab%22:%22source%22%2C%22name%22:%22OME-NGFF%22%7D%5D%2C%22layout%22:%224panel%22%7D

image

⚠️ Data is not being returned from zarr-python: https://github.com/zarr-developers/zarr-python/issues/2029

joshmoore commented 2 months ago

Note: currently the wells need yq -iP '.attributes.ome.version="0.5"' zarr.json -o json updates to pass validation.

idr0001-2551.zarr $ find */* -maxdepth 1 -name zarr.json -exec grep version {} /dev/null \;
# hand-edited
A/1/zarr.json:      "version": "0.5",
A/2/zarr.json:      "version": "0.5",
A/3/zarr.json:      "version": "0.5",

idr0001-2551.zarr $find */* -maxdepth 1 -name zarr.json -exec yq -iP '.attributes.ome.version="0.5"' {} -o json \;