ome / ome2024-ngff-challenge

Project planning and material repository for the 2024 challenge to generate 1 PB of OME-Zarr data
https://pypi.org/project/ome2024-ngff-challenge/
BSD 3-Clause "New" or "Revised" License
11 stars 8 forks source link

Support bioformats2raw.layout #17

Closed joshmoore closed 1 month ago

joshmoore commented 1 month ago

This adds support for the top-level bioformats2raw.layout metadata from https://ngff.openmicroscopy.org/0.4/#bf2raw and is a template for any work around supporting more collections.

cc: @dominikl @will-moore @normanrz

dominikl commented 1 month ago

Something's still not quite right.

(ngff_env) [dlindner@pilot-zarr3-dev man_test]$ ome2024-ngff-challenge asterella_gracilis_swe_stature.ome.zarr test.zarr
Traceback (most recent call last):
  File "/home/dlindner/miniconda3/envs/ngff_env/bin/ome2024-ngff-challenge", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 611, in cli
    converted = main(ns)
                ^^^^^^^^
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 547, in main
    convert_image(
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 345, in convert_image
    convert_array(
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 242, in convert_array
    write = ts.open(write_config).result()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Error opening "zarr3" driver: Error writing local file "test.zarr/0/0/zarr.json": Cannot create using specified "metadata" and schema: Incompatible chunk size constraints for dimension 3: read size of 1024, write size of 4138 [source locations='tensorstore/driver/zarr3/metadata.cc:899\ntensorstore/driver/zarr3/driver.cc:566\ntensorstore/driver/zarr3/driver.cc:566\ntensorstore/internal/cache/kvs_backed_cache.h:208\ntensorstore/driver/driver.cc:117'] [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{},\"file_io_sync\":true},\"create\":true,\"driver\":\"zarr3\",\"dtype\":\"uint8\",\"kvstore\":{\"driver\":\"file\",\"path\":\"test.zarr/0/0/\"},\"metadata\":{\"chunk_grid\":{\"configuration\":{\"chunk_shape\":[1,3,1,4138,5518]},\"name\":\"regular\"},\"chunk_key_encoding\":{\"name\":\"default\"},\"codecs\":[{\"configuration\":{\"chunk_shape\":[1,1,1,1024,1024],\"codecs\":[{\"configuration\":{\"endian\":\"little\"},\"name\":\"bytes\"},{\"configuration\":{\"clevel\":5,\"cname\":\"zstd\"},\"name\":\"blosc\"}],\"index_codecs\":[{\"configuration\":{\"endian\":\"little\"},\"name\":\"bytes\"},{\"name\":\"crc32c\"}],\"index_location\":\"end\"},\"name\":\"sharding_indexed\"}],\"data_type\":\"uint8\",\"dimension_names\":[\"t\",\"c\",\"z\",\"y\",\"x\"],\"node_type\":\"array\",\"shape\":[1,3,1,4138,5518]},\"transform\":{\"input_exclusive_max\":[[1],[3],[1],[4138],[5518]],\"input_inclusive_min\":[0,0,0,0,0],\"input_labels\":[\"t\",\"c\",\"z\",\"y\",\"x\"]}}']
joshmoore commented 1 month ago

Cannot create using specified "metadata" and schema: Incompatible chunk size constraints for dimension 3: read size of 1024, write size of 4138

I'm not sure where it got 1024 from. Can you try overwriting the chunks & shard sizes and see if you can get it to work for you? I assume we will need a better heuristic in https://github.com/ome/ome2024-ngff-challenge/blob/main/src/ome2024_ngff_challenge/resave.py#L31 if you want to give it a try.

dominikl commented 1 month ago

Ok, I'll test that. In the meantime tried again with a different file, different error (but I guess similar?):

(ngff_env) [dlindner@pilot-zarr3-dev man_test]$ bioformats2raw --memo-directory memo /uod/idr/filesets/idr0154-queen-hdbr/CS16_Well31_brightfield_resized.ome.tiff v2.zarr
(ngff_env) [dlindner@pilot-zarr3-dev man_test]$ ls
memo  v2.zarr
(ngff_env) [dlindner@pilot-zarr3-dev man_test]$ ome2024-ngff-challenge v2.zarr v3.zarr
Traceback (most recent call last):
  File "/home/dlindner/miniconda3/envs/ngff_env/bin/ome2024-ngff-challenge", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 611, in cli
    converted = main(ns)
                ^^^^^^^^
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 547, in main
    convert_image(
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 309, in convert_image
    ds_shards = guess_shards(ds_shape, ds_chunks)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dlindner/miniconda3/envs/ngff_env/lib/python3.12/site-packages/ome2024_ngff_challenge/resave.py", line 42, in guess_shards
    raise ValueError(f"no shard guess: shape={shape}, chunks={chunks}")
ValueError: no shard guess: shape=(1, 3, 1, 24288, 24320), chunks=(1, 1, 1, 1024, 1024)
joshmoore commented 1 month ago

So something like:

# Make chunks the full size 
ome2024-ngff-challenge dom.zarr out.zarr --output-overwrite --output-chunks=1,3,1,4138,5518

works for me, but we should look into making this easier to use. (This is unrelated to this PR in particular.) 2069 x 2759 would also work.

Note: dom.zarr was created by passing the following .fake.ini file to bioformats2raw:

sizeT=1
sizeC=3
sizeZ=1
sizeY=4138
sizeX=5518
joshmoore commented 1 month ago

Moving ahead with the release of 0.0.5 since the error message was unrelated. More issues welcome though of course.