Open slevang opened 1 week ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.89%. Comparing base (
2309714
) to head (39e454a
).
🚨 Try these New Features:
Maybe we should expand the numcodecs API to support future specs?
delta_spec: Literal["auto", "none", "try_consecutive", "try_lookback"] = "auto" # ignored if <v0.3
delta_encoding_order: Optional[int] = None # ignored if >=v0.3 and spec != try_consecutive?
Or accept actual ModeSpec/DeltaSpec
objects, with sensible auto defaults?
cc @rabernat @mwlon
Maybe we should expand the numcodecs API to support future specs?
Could you say more about what specifically you mean here? I don't quite get the context for this comment.
@mwlon presumably understands the underlying changes much better than I do. But the gist is pcodec now takes a more flexible DeltaSpec
object, rather than a simple int that routes to auto/consecutive modes.
This broke the numcodecs interface on the latest release, so I'm just trying to figure out the right path to support any future additions without breaking existing numcodecs users.
Can we just bump pcodec version to >=0.3,<0.4 so we don't need to support both code paths?
You're correct: I added a new type of delta encoding, now handled by delta spec. This should be harder to break API wise in the future.
Yeah that makes a lot more sense :+1:
A key consideration here, and an important priority for numcodecs, is backwards compatibility. Ideally any data written by Zarr will be readable for a long time into the future. This means that breaking changes in the codec parameters, which would cause decoding of existing data to fail, should be avoided.
This could be relaxed if we had some sort of versioning system for codecs. But unfortunately we don't.
For any of the proposals above, I would ask these questions:
I believe these are purely API changes and feature additions, since the ChunkConfig
is only involved in setting up the encoding step. Presumably everything that's needed for decoding is self contained.
Would be good for @mwlon to confirm though. And I do wonder if the new modes (e.g. DeltaSpec.try_lookback
, ModeSpec.try_float_quant
) can be decoded by older versions of pcodec, although that's much less important.
Is the change future-proof to additional future evolution of the codec parameters?
Seems all of the config API has evolved to these spec objects, so maybe now is a good time to make the numcodecs API match fully:
level: int = 8,
delta_spec: DeltaSpec | None = None,
mode_spec: ModeSpec | None = None,
paging_spec: PagingSpec | None = None,
and the user is responsible for passing through an explicit pcodec object if they want custom behavior.
Can we still read old data written with earlier versions of the codec?
Yes! Pcodec will always be able to decode older format versions.
And I think I agree with @slevang 's last comment: I'm in favor of passing the pcodec specs straight through to keep everything simple. Presumably we can convert non-spec kwargs into these (based on type) for numcodec's API backward compatibility. @rabernat do you agree?
Presumably we can convert non-spec kwargs into these (based on type) for numcodec's API backward compatibility.
It's entirely possible Ryan and I are the only ones actually using numcodecs.pcodec
, and I've never used any options besides level
. So API backwards compat is probably not super important here :slightly_smiling_face:
and the user is responsible for passing through an explicit pcodec object if they want custom behavior
Ah nevermind I think, because numcodecs expects the codec config to be json-serializable which is a requirement of zarr. I guess we'll have to just pile on the kwargs to support additional features.
It's entirely possible Ryan and I are the only ones actually using
numcodecs.pcodec
,
Definitely not the case. We have customers with petabytes of Zarr data (now a little bit less! 🙌 ) encoded with pcodec.
Ok new approach, keep the string style args but also support delta_spec
and paging_spec
. It would be possible to add support for the try mode specs now, but wasn't straightforward to integrate with the existing tests since these are dtype specific. So I skipped it.
Compatibility for the new
delta_spec
argument and modes inpcodec.ChunkConfig
. Maintains backwards compatibility if onlydelta_encoding_order
is passed. Also adds thepaging_spec
arg.Closes #623
TODO: