zarr-developers / VirtualiZarr

Create virtual Zarr stores from archival data files using xarray syntax
https://virtualizarr.readthedocs.io/en/latest/
Apache License 2.0
100 stars 21 forks source link

Forbid relative paths, and use file URI scheme internally? #242

Open TomNicholas opened 2 weeks ago

TomNicholas commented 2 weeks ago

We currently allow manifests to contain relative local paths, e.g. test.nc. This is (a) more fragile than an absolute path /test.nc, and (b) not very consistent with cloud bucket urls, which are always absolute.

It would be more robust to ensure that paths in the Manifest are always absolute paths. (If the user wants to move their data around they can always use the .rename_paths method to explicitly adjust the paths in their manifest to point to the files' new locations.)

If all local paths are to be stored as absolute paths then we also might want to use the file URI scheme, so that local paths are stored as file:///test.nc. That way in the Manifest the form of the path is consistent, whether it is local or remote. However as kerchunk does not use this scheme, it will mean extra conversion steps are needed to go between the two formats.

cc @mpiannucci

mdsumner commented 2 weeks ago

Oh nice, this has caught me out and I'm glad to see it laid out this way.

mpiannucci commented 2 weeks ago

Ah I was wondering what kerchunk does because according to docs fsspec supports the file:// scheme. Thanks for this

TomNicholas commented 2 weeks ago

I haven't tried to use fsspec to read data from references that use a file:/// prefix, but the kerchunk readers return dicts containing references that use relative paths. So coercing those relative paths into absolute URIs does break some of VirtualiZarr's kerchunk roundtripping tests.