Open ghidalgo3 opened 4 days ago
Just wanted to say that the Zarr V3 class behaviors are not set in stone yet. If you have feedback on the V3 API and potential incompatibilities between V2, please report them issues on Zarr-python: https://github.com/zarr-developers/zarr-python/issues
My sense is that in terms of file format both zarr v2 and zarr v3 should be supported but in terms of library dependencies it would be fine to mandate that only zarr-python >= 3 is supported.
Thanks for taking the initiative here @ghidalgo3 !
I tried using Zarr-V3's Array definition instead of VirtualiZarr's ZArray definition
I was actually hoping we could use some class from zarr-python
that wasn't a fully-featured zarr.Array
, instead using something that just represented the metadata for one array. This ArrayV3Metadata
class might be what we need? Is that public API?
If VirtualiZarr needs to support both Zarr V2 and V3 stores, then this change needs to handle all the little differences in Zarr's API between 2 and 3. But if only V3 is supported, this becomes easier and replacing
ZArray
withArray
should not be too difficult.
I see two ways to do this:
Make VirtualiZarr internally agnostic to v2 vs v3. Then a ManifestArray
can represent either a v2 array or a v3 array, distinguished via its metadata. For example the attributes of a ManifestArray
might become
from zarr.metadata import ArrayMetadata
class ManifestArray:
_chunkmanifest: ChunkManifest
_zarray: ArrayMetadata
where ArrayMetadata is an ABC that needs to be a concrete ArrayV2Metadata
or ArrayV3Metadata
.
I'm not sure which of these two is the best approach.
I also completely agree with @jsignell - all of this should be supported through only a dependency on zarr>=3.0.0
, as this PR adds.
Before going further with these changes, I'd like to discuss if this direction is correct or not. Based on #17, I tried using Zarr-V3's Array definition instead of VirtualiZarr's
ZArray
definition. Keeping the names straight is hard,ZArray
is the current code andArray
iszarr
's type.The problems I ran into were:
kwargs
forArray
are different between v2 and v3 for zarr, butVirtualiZarr.ZArray
I believe is first created with the v2 properties and then transformed to v3 format whenzarr_v3_array_metadata
is called. After the replacement, anywhere VirtualiZarr creates aArray
, one must choose v2 or v3kwargs
, or again introduce something that abstracts over the versions. Unless VirtualiZarr chooses to only support Zarr V3 only, which I believe is the intention.zarr.array.metadata
because the internal zarr property names have changed. Probably there will be other code-level incompatibilities from metadata property name differences, I think #94 ran into this too.This is just from a few hours of trying this out.
If VirtualiZarr needs to support both Zarr V2 and V3 stores, then this change needs to handle all the little differences in Zarr's API between 2 and 3. But if only V3 is supported, this becomes easier and replacing
ZArray
withArray
should not be too difficult.Thoughts?
Incompatibilities:
docs/releases.rst
api.rst