Open TomNicholas opened 3 months ago
In particular we could just call cloudpathlib.AnyPath
https://cloudpathlib.drivendata.org/stable/anypath-polymorphism/
This would be really cool @TomNicholas!
Seems like it can read over s3 into xarray:
from cloudpathlib import CloudPath
import xarray as xr
cloudpath = CloudPath("s3://carbonplan-share/air_temp.nc")
ds = xr.open_dataset(cloudpath)
A little more exploration. It looks like SingleHDFToZarr works both for s3 and local.
from kerchunk.hdf import SingleHdf5ToZarr
import io
from cloudpathlib import CloudPath
import xarray as xr
# from cloudpathlib import AnyPath
cloudpath = CloudPath("s3://carbonplan-share/air_temp.nc")
with open(cloudpath, 'rb') as f:
contents = f.read()
refs = SingleHdf5ToZarr(io.BytesIO(contents)).translate()
refs
Some more thoughts - one way to smooth this transition would be to replace all uses of UPath
(which is based on fsspec) with cloudpathlib's AnyPath
. They are both very similar - for example they both implement a .stat
method, which is used in https://github.com/zarr-developers/VirtualiZarr/pull/187/files#r1678802398.
The snag here is that I don't think cloudpathlib supports https...
The snag here is that I don't think cloudpathlib supports https...
I raised https://github.com/drivendataorg/cloudpathlib/issues/455
AFAIK the only filesystems we need to read from are local and cloud, so could we just use pathlib and cloudpathlib?