Open ilan-gold opened 3 years ago
Hmmm, it seems that this is not a codec but a "filter." Does this belong in zarr.js then?
Seems to work well: https://github.com/vitessce/vitessce/pull/948/files
Can contribute if you're interested but not sure how you want to set up filters here/zarr.js
I think it makes sense to add filters to numcodecs.js
(that's where they live for zarr-python
, and they implement the codecs interface). However, currently zarr.js doesn't support using filters. That alone should be straigh-forward to add (essentially decode a chunk and then run the decoded chunk through a filter codec); however, the real issue is the "dtype" itself here.
Zarr.js only supports (numeric) dtypes that have an analogous TypedArray. There are no variably sized TypedArrays in JavaScript so the decoded data would need to lives in a JavaScript Array
. Zarr.js relies on TypedArray APIs in both RawArray
and NestedArray
, so it would be tricky to add a dtype that currently isn't supported.
@manzt I'm not as familiar with this so I defer to you here. Would it make sense to create a new typed array like StringArray
? Or some sort of catch-all for non-recognized types? This is definitely out of my wheelhouse for me so if you want to come up with a roadmap here, I can help fill in with PR's etc.
Thank you for opening this issue. I'm looking into this issue to implement the support for zarr.js. I had a call on this topic with the develop @gzuidhof.
I looked at some details on how this is done in the Python package numcodecs with vlenutf8 support and noticed the following things:
reshape(-1)
. So even if you pass multidimensional arrays, the result will be 1 dimensional. Other things to note are:
Based on the assumption that the implementation approach should follow the python numcodecs implementation. I would suggest to do the following roadmap:
filters
support in zarr.js We're glad to contribute in any of the subtasks.
Following on this. Is this something that the developers are still interested? I might contribute with this one.
VlenUtf8 is a common codec for string arrays, and porting it should be relatively straightforward: https://github.com/zarr-developers/numcodecs/blob/2c1aff98e965c3c4747d9881d8b8d4aad91adb3a/numcodecs/vlen.pyx#L48-L178
I'm working on doing this for Vitessce, so if you're interested let me know!