zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

User attributes not being saved to file #2079

Closed dstansby closed 1 month ago

dstansby commented 1 month ago

Zarr version

2.18.2

Numcodecs version

0.12.1

Python Version

3.11

Operating System

macOS

Installation

conda

Description

I'm trying to save user attributes to file, but they don't seem to be saved when calling zarr.save_array.

Steps to reproduce

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256)
arr.attrs["attr"] = "value"
zarr.save_array("test_attr_arr.zarr", arr)

The resulting zarr array on disk does not have a .zattrs file as I would expect.

Additional output

No response

jhamman commented 1 month ago

I think this is actually the expected behavior. Without save_array the attrs are saved.

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256)
arr.attrs["attr"] = "value"
print(arr.store[".zattrs"])
# b'{\n    "attr": "value"\n}'

But save_array creates a new array where the arr argument is a numpy-like thing and does not look for attrs. So your last line is creating a new array in a new store.

dstansby commented 1 month ago

🤔 so how do I save the attributes to disk?

jhamman commented 1 month ago

When you create an array, you need to associate a store (the default, as you have shown, uses a memory store). This works 👇

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256, store="test_attr_arr.zarr")
arr.attrs["attr"] = "value"
jhamman commented 1 month ago

Going to take the ❤️ as permission to close this. Feel free to reopen if you run into more gotchas!

(btw, you'll be able to do this all in one call in v3:

arr = zarr.ones(
    shape=(512, 512, 512),
    dtype=np.uint8,
    chunks=256,
    store="test_attr_arr.zarr",
    attributes={"attr": "value"}
)
dstansby commented 1 month ago

What I really want to do is work with an array in memory (including adding some attributes), and then save it in one go to disk. So I think it's worth at least documenting how to do that workflow somewhere?

d-v-b commented 1 month ago

I think this would be equivalent to copying the memory store to a directory store. Is there not a copy_store routine in v2? With the v2 mutable mapping api, you might be able to to do local_store.update(**memory_store), although users should see something a bit more familiar

dstansby commented 1 month ago

Yep, there is a copy_store function.

Clearly my mental model of how zarr (or at least save_array) works wasn't very good, so I might suggest some improvements to the save_array docstring explaining the difference between save_array and copy_store.

d-v-b commented 1 month ago

a few thoughts about this flow:

d-v-b commented 1 month ago

This issue made me think about array creation routines, I wrote up some ideas here: https://github.com/zarr-developers/zarr-python/issues/2083