Open alejoe91 opened 2 years ago
there is not a clear way to do encode and decode in memory
I think it might be enough to provide your own callback functions for reading and writing from memory instead of file. It's a standard C API that can be implemented in Cython.
@cgohlke we tried to reimplement the codec with cython following examples from others in the numcodecs library. We needed to make two additional C functions (see encoder.c
and decoder.c
) for in-memory compression-decompression.
Would you mind taking a look? https://github.com/AllenNeuralDynamics/wavpack_numcodecs/tree/cython/wavpack_cython
If everything looks ok, we'll extend the cython version with additional options and start preparing a PR.
Hi guys,
We successfully implemented the Cython version here: https://github.com/AllenNeuralDynamics/wavpack_numcodecs/pull/6
If you guys are ok with it, I'll start preparing a PR to numcodecs :)
A couple of comments:
wavpack
library externally and to build against it. (Note that wavpak is available through apt install
and homebrew
)chocolatey
(but only an old version). Therefore, we think it's probably better to ship the pre-compiled dll
for 32 and 64bit architecture.Let us know if this approach sounds reasonable!
Cheers Alessio
@joshmoore : given the entrypoints registration system (when it works), is there any reason to want to add more compiled codecs directly into numcodecs, or should they be separate packages?
The primary trade-off I would think would be how a user is to know what package needs installing. If there's another package that is likely to be installed in which the codec could live then it's fairly straight-forward. Alternatively, the registry that's in progress should allow clients to find documentation on what package provides a given codec_id
: https://alt-shivam.github.io/Codecs-Registry/
cc: @Alt-Shivam
@joshmoore : given the entrypoints registration system (when it works), is there any reason to want to add more compiled codecs directly into numcodecs, or should they be separate packages?
@martindurant sorry I'm not super familiar with the entrypoints registration. Could you explain what you mean?
An argument to setup()
in a typical setup.py:
entry_points={
"numcodecs.codecs": [
"grib = kerchunk.codecs:GRIBCodec",
"fill_hdf_strings = kerchunk.codecs:FillStringsCodec",
"FITSAscii = kerchunk.codecs:AsciiTableCodec",
"FITSVarBintable = kerchunk.codecs:VarArrCodec",
"record_member = kerchunk.codecs.RecordArrayMember",
],
},
(this one copied from kerchunk) will make each class on the right hand side of an "=" available under the name given on the left hand side.
@alejoe91: https://entrypoints.readthedocs.io/en/latest/ allows projects to an extension point which other project can then implement. So, as long as you have run e.g. pip install numcodecs-wavpack
, the main numcodecs library will be able to find it at runtime (https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/registry.py#L10).
Thank you, Josh, that's more succinct an to the point than what I said :)
Thank you for the explanation. So you suggest we make our own package and then add it as an entrypoint?
One question: If we use the numcodecs-wavpack
for compression, and then someone else (unaware of wavpack-numcodecs
) wants to access and decode our data, is there a way to prompt the message: you need to pip install numcodecs-wavpack
or will it print wavpack codec not found in the registry
?
Would you guys be available to discuss about this over a call?
Thanks! Alessio
is there a way to prompt the message:
As things stand, you would see a simple ImportError or unknown-codec. There is a mention of an online registry resource, above, which we could (but do not yet) point the user to, and ought to provide install instructions/links for each codec.
As a reference, I have two plugin systems I maintain with opposite philosophies:
Hi numcodecs team!
First of all, thank you for the amazing resource that you put together!
I would like to inquire whether you would be interested in including the WavPack codec as an available numcodecs compressor. Wavpack is an audio codec developed by @dbry and it has both a lossless mode (default) and also an interesting lossy mode (hybrid mode). In addition to working well for audio signals, it performs really well for any kind of timeseries and it can compress up-to 1024 channels simultaneusly. We use if for data from high-density electrophysiology and it gives very good compression performance.
I'm also the core developer of SpikeInterface, an open-source framework for electrophysiology analysis. We have a built-in save to zarr function, so having a codec that is specifically designed for audio-like timeseries data would be very convenient for the elctrophysiology community (I'm sure also the NWB folks would like to use it once the ZARR-backend is available - hopefully soon).
We have a working version of a numcodecs implementation of WavPack here: https://github.com/AllenNeuralDynamics/wavpack_numcodecs Internally, it uses the WavPack CLI to encode and decode using pipes to pass binary data between processes. The wavpack binaries for Windows, macOS, and Linux are also shipped with the package and the tests are run on all three platforms. We currently use the CLI rather than binding the wavpack C library directly because there is not a clear way to do encode and decode in memory. But we are open to suggestions!
We look forward to hearing your thoughts!
Cheers Alessio