zarr-developers / VirtualiZarr

Create virtual Zarr stores from archival data files using xarray syntax
https://virtualizarr.readthedocs.io/en/latest/
Apache License 2.0
95 stars 18 forks source link

Extend refspec support to [path] entries (without offset/length) #187

Closed maresb closed 2 months ago

maresb commented 2 months ago

I'm trying to open a Zarr v2. I couldn't find a good obvious way to do this, so my attempt was:

from kerchunk.zarr import ZarrToZarr
from virtualizarr.xarray import dataset_from_kerchunk_refs

ref = ZarrToZarr(url).translate()
ds = dataset_from_kerchunk_refs(ref)

This fails here in unpacking because kerchunk produces reference entries of the form [path] instead of [path, length, offset].

I was wondering if I could work around this by stating the chunks, and sure enough it worked! (See my second commit.)

I'd love to get some feedback on this, including if there's a better way to open v2 zarrs.

In preparation for my second commit I needed to clean up some types. To make things more precise and clean, I tell a minor lie: that the [path, offset, length] lists are tuples. This was done in the spirit of duck typing, and I'd argue is more semantically accurate. (Not to mention that tuples serialize to lists in JSON.)

maresb commented 2 months ago

@TomNicholas, I think I'm ready for another round of review, whenever you are.

TomNicholas commented 2 months ago

Sorry for the slow review @maresb ! Just some very small comments here.

maresb commented 2 months ago

Sorry for the slow review @maresb ! Just some very small comments here.

No worries, no rush, and thanks so much for maintaining!!!

maresb commented 2 months ago

I think I'm all caught up with your comments.

I don't think you're asking for a change regarding the NotImplementedError on inlined data, but I'm not certain about this.

Please let me know if you'd like any further changes, or if you'd like me to rebase.

Thanks @TomNicholas!

TomNicholas commented 2 months ago

I think this looks good, thank you @maresb !