Closed dstansby closed 1 month ago
I think this is actually the expected behavior. Without save_array
the attrs are saved.
import numpy as np
import zarr
arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256)
arr.attrs["attr"] = "value"
print(arr.store[".zattrs"])
# b'{\n "attr": "value"\n}'
But save_array
creates a new array where the arr
argument is a numpy-like thing and does not look for attrs
. So your last line is creating a new array in a new store.
🤔 so how do I save the attributes to disk?
When you create an array, you need to associate a store (the default, as you have shown, uses a memory store). This works 👇
import numpy as np
import zarr
arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256, store="test_attr_arr.zarr")
arr.attrs["attr"] = "value"
Going to take the ❤️ as permission to close this. Feel free to reopen if you run into more gotchas!
(btw, you'll be able to do this all in one call in v3:
arr = zarr.ones(
shape=(512, 512, 512),
dtype=np.uint8,
chunks=256,
store="test_attr_arr.zarr",
attributes={"attr": "value"}
)
What I really want to do is work with an array in memory (including adding some attributes), and then save it in one go to disk. So I think it's worth at least documenting how to do that workflow somewhere?
I think this would be equivalent to copying the memory store to a directory store. Is there not a copy_store
routine in v2? With the v2 mutable mapping api, you might be able to to do local_store.update(**memory_store)
, although users should see something a bit more familiar
Yep, there is a copy_store function.
Clearly my mental model of how zarr (or at least save_array
) works wasn't very good, so I might suggest some improvements to the save_array
docstring explaining the difference between save_array
and copy_store
.
a few thoughts about this flow:
attrs
at array creation time, then your first attempt would have worked @dstansby, because save_array
forwards **kwargs
to _create_array
under the hood, and that's a route attrs
could take, but only if _create_array
took attrs
, which it does not. A million years ago I had a PR to fix this, but I think with the v3 api joe showed we don't have this problem any more.pydantic-zarr
, I implemented a from_array
function that takes an array-like input (i.e., shape and dtype are required) and checks if that input has an attrs
attribute, or a chunks
attribute, or a filters
attribute, and so on for all the zarr array attributes, to create the resulting (model) zarr array (and this attribute inference can be overridden with a concrete value). I think we should do something similar in zarr-python
. This gets a bit more complicated for array-like objects that have different "syntax" for the same semantics, e.g. the dask array chunks
attribute is an explicit list of chunk sizes. We might need to define this variation via protocols, and dispatch on the shape of incoming array-like objects. Work to be done for sure, but I think this is something that a lot of users would appreciate.This issue made me think about array creation routines, I wrote up some ideas here: https://github.com/zarr-developers/zarr-python/issues/2083
Zarr version
2.18.2
Numcodecs version
0.12.1
Python Version
3.11
Operating System
macOS
Installation
conda
Description
I'm trying to save user attributes to file, but they don't seem to be saved when calling
zarr.save_array
.Steps to reproduce
The resulting zarr array on disk does not have a
.zattrs
file as I would expect.Additional output
No response