google / tensorstore

Library for reading and writing large multi-dimensional arrays.
https://google.github.io/tensorstore/
Other
1.34k stars 120 forks source link

N5 attributes.json lacks n5 key for version #200

Open dchen116 opened 2 days ago

dchen116 commented 2 days ago

N5 datasets saved by tensorstore do not include a top level n5 key for the n5 version. Here is the minimum working example of the problem.

Commands:

$ pixi run python .\tensorstore_n5_issue.py
Traceback (most recent call last):
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 202, in _load_metadata_nosync
    meta_bytes = self._store[mkey]
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\n5.py", line 376, in __getitem__
    value = array_metadata_to_zarr(self._load_n5_attrs(key_new), top_level=top_level)
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\n5.py", line 657, in array_metadata_to_zarr
    array_metadata.pop("n5")
KeyError: 'n5'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\User\tensorstore_n5_issue.py", line 64, in <module>
    main(apply_fix)
  File "C:\User\tensorstore_n5_issue.py", line 60, in main
    n5_read_and_checksum_array(store_path)
  File "C:\User\tensorstore_n5_issue.py", line 35, in n5_read_and_checksum_array
    zarr.open(store=n5_store, mode='r')
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\convenience.py", line 133, in open
    return open_array(_store, mode=mode, **kwargs)
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\creation.py", line 689, in open_array
    z = Array(
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 170, in __init__
    self._load_metadata()
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 193, in _load_metadata
    self._load_metadata_nosync()
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 204, in _load_metadata_nosync
    raise ArrayNotFoundError(self._path) from e
zarr.errors.ArrayNotFoundError: array not found at path %r' ''
$ pixi run python .\tensorstore_n5_issue.py --fix
C:\User\AppData\Local\Temp\tmpysaajast
Added 'n5': '4.0.0' to C:\User\AppData\Local\Temp\tmpysaajast\attributes.json

tensorstore_n5_issue.py:

import numpy as np
import tempfile
import tensorstore as ts
import zarr
import os
import sys
import json

def ts_create_n5_test(n5_path):
    chunk_shape = (16, 16)
    data = np.arange(np.prod(chunk_shape)).reshape(chunk_shape)

    # Set up the basic N5 store specification
    n5_store_spec = {
        'driver': 'n5',
        'kvstore': {
            'driver': 'file',
            'path': n5_path
        },
        'metadata': {
            'dimensions': list(data.shape),
            'blockSize': list(chunk_shape),
            'dataType': data.dtype.name,
            'compression': {
                'type': 'raw'
            }
        }
    }

    n5_store = ts.open(n5_store_spec, create=True, delete_existing=True).result()
    n5_store.write(data).result()

def n5_read_and_checksum_array(store_path):
    n5_store = zarr.N5FSStore(store_path)
    zarr.open(store=n5_store, mode='r')

# Function to load and fix the attributes.json metadata
def fix_attributes_json(store_path):
    # Define the path to attributes.json
    attributes_json_path = os.path.join(store_path, "attributes.json")

    # Load the content of attributes.json
    with open(attributes_json_path, "r") as file:
        attributes_data = json.load(file)

    attributes_data["n5"] = "4.0.0"
    print(f"Added 'n5': '4.0.0' to {attributes_json_path}")

    # Write the modified data back to attributes.json
    with open(attributes_json_path, "w") as file:
        json.dump(attributes_data, file, indent=4)

def main(apply_fix):
    store_path = tempfile.mkdtemp()
    print(store_path)
    ts_create_n5_test(store_path)
    if apply_fix:
        fix_attributes_json(store_path)
    n5_read_and_checksum_array(store_path)

if __name__ == '__main__':
    apply_fix = len(sys.argv) > 1 and sys.argv[1] == "--fix"
    main(apply_fix)

pixi.toml:

[project]
authors = ["Diyi Chen <chend@janelia.hhmi.org>"]
channels = ["conda-forge"]
description = "Demonstrate issue when saving N5 datasets using Tensorstore"
name = "tensorstore-n5-issue"
platforms = ["win-64"]
version = "0.1.0"

[tasks]

[dependencies]
python = "3.10.*"
numpy = ">=2.0.1,<3"
zarr = ">=2.18.2,<3"
numcodecs = ">=0.12.1,<0.13"
fsspec = ">=2024.9.0,<2025"

[pypi-dependencies]
tensorstore = ">=0.1.64, <0.2"
laramiel commented 19 hours ago

We should add that. Just a note that according to the Java code, the "n5" version attribute doesn't need to be set:

https://github.com/saalfeldlab/n5/blob/b8d92d5b25ae08c96527f831104ede732553c8e3/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L212

If no version is specified or the version string does not conform to the SemVer format, 0.0.0 will be returned. For incomplete versions, such as 1.2, the missing elements are filled with 0, i.e. 1.2.0 in this case.