pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.6k stars 1.08k forks source link

Support remote string paths for `h5netcdf` engine #8423

Open jrbourbeau opened 11 months ago

jrbourbeau commented 11 months ago

Is your feature request related to a problem?

Currently the h5netcdf engine supports opening remote files, but only already open file-like objects (e.g. s3fs.open(...)), not string paths like s3://.... There are situations where I'd like to use string paths instead of open file-like objets

Describe the solution you'd like

It would be nice if I could do something like the following:

ds = xr.open_mfdataset(
    files,    # A bunch of files like `s3://bucket/file`
    engine="h5netcdf",
    ...
    parallel=True,
    storage_options={...},    # fsspec-compatible options
)

and have my files opened prior to handing off to h5netcdf. storage_options is already supported for Zarr, so hopefully extending to h5netcdf feels natural.

Describe alternatives you've considered

No response

Additional context

No response

kmuehlbauer commented 11 months ago

@jrbourbeau At h5netcdf we've recently made driver kwarg available (not yet released), to enable loading remote files via h5py/hdf5.

See https://github.com/h5netcdf/h5netcdf/pull/220 and https://github.com/pydata/xarray/pull/8360.

Would this already help with your use-case as a first step?

jrbourbeau commented 11 months ago

Thanks for pointing me to that @kmuehlbauer!

Based on the linked PRs, driver= definitely seems related, but I'm wondering how it compare to fsspec. fsspec handles local files, S3, GCSFS, HTTPS, etc. and allows users to forward authentication as well (e.g. AWS key and secret in the case of reading from S3). Can I do this with the new driver= functionality?

kmuehlbauer commented 11 months ago

@jrbourbeau I can't say much to that, unfortunately, since my use-cases are usually local only. So my expertise with cloud access is rather limited.

But you should be able to use authentication, and different sources as well. Maybe @zequihg50 can chime in here with some additional context?

But, this will first happen after h5netcdf release and some changes to xarray to allow for the additional kwargs.

kmuehlbauer commented 11 months ago

@jrbourbeau It might be good to merge #8360 first and add your changes on top. So this might take a little time.