Open TomAugspurger opened 5 days ago
Is it fair to say that filters is the same as array_array_codecs? Is it fair to say that compressor is the same as array_bytes_codecs?
Unfortunately, no. This was discussed in an earlier discussion here: https://github.com/zarr-developers/zarr-python/pull/1944.
There's also https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L79, which accesses Codec.codec_id. I'm not sure yet about how to handle that, but right now the best is maybe .to_dict()["name"] (or we could have .to_dict() access a code_id)?
Probably the place to look for this functionality would be the classes that adapt the v2 compressor / filters to the v3 codecs api: https://github.com/zarr-developers/zarr-python/blob/fbd1658f1f95e0956a6ac294cf6a0b654841fb1c/src/zarr/codecs/_v2.py#L69
In v3
filters / codecs are stored as dicts, but in #2179 I switch to storing instances of numcodecs.abc.Codec
, which i think would permit re-using the old object inspection code?
(xref to the issue tracking the top-level codecs / filters / compressor api): https://github.com/zarr-developers/zarr-python/issues/1943
Thanks for those links. I'll try to digest them and will propose a plan that'll either be compatibility code, or (more likely?) a PR to the migration guide.
Zarr version
v3
Numcodecs version
n/a
Python Version
n/a
Operating System
n/a
Installation
n/a
Description
As part of getting xarray ready for zarr v3, I'm looking at how to handle the codec and filter API.
The primary / first place this is accessed is https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L555-L556, which just reads the values of
.filters
and.compressor
to place them in theDataArray.encoding
. A few questions:.codecs
property to theCodecPipeline
ABC. This is fine for theBatchedCodecPipeline
which AFAICT is the only actual codec pipeline. Does anyone foresee an issue with that? I'm not sure why that class is abstract and loadable through the config.filters
is the same asarray_array_codecs
?compressor
is the same asarray_bytes_codecs
?There's also https://github.com/pydata/xarray/blob/1c6300c415efebac15f5ee668a3ef6419dbeab63/xarray/backends/zarr.py#L79, which accesses
Codec.codec_id
. I'm not sure yet about how to handle that, but right now the best is maybe.to_dict()["name"]
(or we could have.to_dict()
access a code_id)?Steps to reproduce
n/a
Additional output
No response