Open normanrz opened 4 months ago
Hello @normanrz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
The name of the codecs is prefixed with
https://zarr.dev/numcodecs/
to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec
I am not sure about the idea of using a URL that does not actually resolve to anything useful.
pcodec is actually an "Array to Bytes" codec: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/pcodec.py
How would that fit in here?
Any thoughts about what to do with numcodecs codecs not defined in this repo, but currently used via entrypoints?
The name of the codecs is prefixed with
https://zarr.dev/numcodecs/
to avoid naming collisions in case some codecs of numcodecs get added to the Zarr specI am not sure about the idea of using a URL that does not actually resolve to anything useful.
seconding this sentiment, a URL that doesn't resolve to anything is rather confusing. I think numcodecs.<codec_name>
or numcodecs/<codec_name>
are simpler templates for a numcodecs-qualified name.
Any thoughts about what to do with numcodecs codecs not defined in this repo, but currently used via entrypoints?
Could we ask those codecs to implement Zarr codec entrypoints directly? Which codecs do you have in mind?
The challenge is that the V3 codecs are quite a bit more explicit in their typing (Array to Bytes, Bytes to Bytes, etc.) than legacy numcodecs codecs. So automatically translating an arbitrary numcodecs codec to a V3 codec is not possible.
I am thinking of https://github.com/fsspec/kerchunk/blob/main/kerchunk/codecs.py and imagecodecs. There are probably others.
The name of the codecs is prefixed with
https://zarr.dev/numcodecs/
to avoid naming collisions in case some codecs of numcodecs get added to the Zarr specI am not sure about the idea of using a URL that does not actually resolve to anything useful.
I had asked @MSanKeys963 to setup the respective redirects to the numcodecs docs. That should solve that.
pcodec is actually an "Array to Bytes" codec: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/pcodec.py
How would that fit in here?
Must have missed pcodec. I'll add it.
Attention: Patch coverage is 42.00000%
with 145 lines
in your changes missing coverage. Please review.
Project coverage is 94.31%. Comparing base (
42f89d2
) to head (b75e41e
). Report is 4 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
numcodecs/zarr3.py | 57.22% | 74 Missing :warning: |
numcodecs/tests/test_zarr3.py | 7.79% | 71 Missing :warning: |
Could you say something a bit about why this code makes sense to be in numcodecs
instead of zarr-python
? My first thoughts are that numcodecs
sees much less development/maintenance than zarr-python
at the moment, so unless there's a good reason maybe this code should live in zarr-python
?
The idea was that the zarr
package contains the "official" Zarr3 codecs and other libraries can add other codecs via the entrypoint mechanism. The zarr
package provides base classes for doing so.
numcodecs
is one of these libraries that can provide additional codecs. There is quite a bit of glue code to make the existing v2 codecs work with the new v3 base classes. This glue code is tightly coupled with the codecs itself, which is why I think it makes more sense to have it in numcodecs
rather than zarr
. If we would add a new codec to numcodecs
, we would need to make a new zarr
release to support it.
I see that it is a bit weird because zarr
itself depends on numcodecs
.
That makes sense - I'll try and give this a proper review in the next couple of days!
The Zarr v3 specification only lists a few codecs that are officially supported. However, it is desirable to expose the codecs in numcodecs for use with v3 arrays as well. This PR adds wrapper classes for numcodecs support.
The name of the codecs is prefixed with
numcodecs.
to avoid naming collisions in case some codecs of numcodecs get added to the Zarr spec. Also, there is a warning that numcodecs codecs are not officially supported and will likely not work in any other Zarr implementation.Most array-to-array ("filters") and bytes-to-bytes codecs are supported. Absent are the variable-length codecs as well as json, msgpack and pickle.
Here is an example of the persisted configuration:
Use of numcodecs in v2 arrays is not affected.
Fixes https://github.com/zarr-developers/numcodecs/issues/502