Open jrbourbeau opened 11 months ago
@jrbourbeau At h5netcdf we've recently made driver
kwarg available (not yet released), to enable loading remote files via h5py/hdf5.
See https://github.com/h5netcdf/h5netcdf/pull/220 and https://github.com/pydata/xarray/pull/8360.
Would this already help with your use-case as a first step?
Thanks for pointing me to that @kmuehlbauer!
Based on the linked PRs, driver=
definitely seems related, but I'm wondering how it compare to fsspec
. fsspec
handles local files, S3, GCSFS, HTTPS, etc. and allows users to forward authentication as well (e.g. AWS key and secret in the case of reading from S3). Can I do this with the new driver=
functionality?
@jrbourbeau I can't say much to that, unfortunately, since my use-cases are usually local only. So my expertise with cloud access is rather limited.
But you should be able to use authentication, and different sources as well. Maybe @zequihg50 can chime in here with some additional context?
But, this will first happen after h5netcdf release and some changes to xarray to allow for the additional kwargs.
@jrbourbeau It might be good to merge #8360 first and add your changes on top. So this might take a little time.
Is your feature request related to a problem?
Currently the
h5netcdf
engine supports opening remote files, but only already open file-like objects (e.g.s3fs.open(...)
), not string paths likes3://...
. There are situations where I'd like to use string paths instead of open file-like objetsparallel=True
for opening lots of files, serializing open file-like objects back and forth from a remote cluster can be slowparallel=True
+storage_options
would be convenient/performant in that case.Describe the solution you'd like
It would be nice if I could do something like the following:
and have my files opened prior to handing off to
h5netcdf
.storage_options
is already supported for Zarr, so hopefully extending toh5netcdf
feels natural.Describe alternatives you've considered
No response
Additional context
No response