ome / ome2024-ngff-challenge

Project planning and material repository for the 2024 challenge to generate 1 PB of OME-Zarr data
https://pypi.org/project/ome2024-ngff-challenge/
BSD 3-Clause "New" or "Revised" License
11 stars 8 forks source link

sharding codec bytes configuration block #25

Closed will-moore closed 1 month ago

will-moore commented 1 month ago

The zarr.json I get from running ome2024-ngff-challenge is missing a configuration section from the bytes codec within the sharding codec, and this prevents it from being viewed in vizarr, giving errors:

TypeError: Cannot read properties of undefined (reading 'endian')

I noticed that this IS found in the sample at https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/0.0.5/6001240.zarr

It's also found in the spec at https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/codecs/sharding-indexed/v1.0.rst#configuration-parameters

Manually adding this in as follows allows the data to be viewed in vizarr:

{
  "chunk_grid": {
    "configuration": { "chunk_shape": [1, 1, 1, 1024, 1024] },
    "name": "regular"
  },
  "chunk_key_encoding": { "name": "default" },
  "codecs": [
    {
      "configuration": {
        "chunk_shape": [1, 1, 1, 256, 256],
        "codecs": [
          {
            "name": "bytes",
+           "configuration": {
+            "endian": "little"
+            }
          },
          {
            "configuration": {
              "blocksize": 0,
              "clevel": 5,
              "cname": "zstd",
              "shuffle": "bitshuffle",
              "typesize": 1
            },
            "name": "blosc"
          }
        ],
        "index_codecs": [
          { "configuration": { "endian": "little" }, "name": "bytes" },
          { "name": "crc32c" }
        ]
      },
      "name": "sharding_indexed"
    }
  ],
  "data_type": "uint8",
  "dimension_names": ["t", "c", "z", "y", "x"],
  "fill_value": 0,
  "node_type": "array",
  "shape": [1, 4, 1, 140, 167],
  "zarr_format": 3
}
LDeakin commented 1 month ago

"endian" is only required for multi-byte data types where endianness is applicable, so vizarr is non-conformant in this case.

https://zarr-specs.readthedocs.io/en/latest/v3/codecs/bytes/v1.0.html#configuration-parameters

will-moore commented 1 month ago

Thanks @LDeakin.

Hi @manzt - does this look right to you? Potential issue with zarrita?

I tried to create a sample image to illustrate the issue: I'm not getting the expected error in vizarr but the image still fails to display and applying the change to zarr.json above seems to fix it (becomes viewable) https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://minio-dev.openmicroscopy.org/idr/v0.5/astronaut.zarr

Thanks

manzt commented 1 month ago

Yeah, this is a bug in zarrita. The issue arises not because endian is missing, but rather because the configuration object is omitted entirely.

manzt commented 1 month ago

Fix now available, just need to bump in Vizarr.

will-moore commented 1 month ago

Thanks @manzt - Sample image above is now displaying in vizarr at https://hms-dbmi.github.io/vizarr/?source=https://minio-dev.openmicroscopy.org/idr/v0.5/astronaut.zarr Closing...