Closed Mikejmnez closed 1 year ago
My memory fails on this one - you may need to pass your URL as a list of strings rather than a glob. One would think it should be simple enough to expand the glob path here if xarray doesn't.
Alternatively, since the error is specifically about passing storage_options, all of the arguments you have there could have been passed by environment variable instead (or fsspec config), so it's worth checking whether that works. That might not be sufficient workaround, since you don't want to ask other users of the catalog to have to configure their systems, but it might point to what is going wrong within xarray.
Thanks @martindurant . Passing a list of all entries worked. Something like
s3_paths = 's3://bucket_name/file*'
fileset = [f"s3://{filedb}" for filedb in s3_ceph.glob(s3_paths)]
new_ds = intake.open_zarr(urlpath=fileset, storage_options=storage_options, parallel=True, consolidated=True).to_dask()
with s3_ceph
the associated s3fs.S3FileSystem
to the storage_options....
I am interested in reading multiple zarr files form a .yaml file and opening them via intake (something like
cat[].to_dask()
). I found this link (https://discourse.pangeo.io/t/how-to-read-multiple-zarr-archives-at-once-from-s3/2564) very helpful when creating the dataset manually, but in my case I would like to do so from the intake catalog approach...In the case of a single zarr file, I am able to create a yaml file which successfully opens a dataset. The .yaml entry looks like
I tried passing a glob
urlpath: 's3://bucket_name/file*'
, along with some additional parameters for xarray (likeparallel=True
,consolidated=True
) but that didn't work.The behavior (error I get) is similar to the case when
intake.open_zarr
is used to open a single zarr vs multiple zarr stores via a url defined as a glob. This is the following worksBut, if
s3_path
is instead a glob likes3://bucket_name/file*
referencing many files, I get the following:Is there a work around this? Like I mentioned in the beginning, I would like to incorporate the correct arguments into an .yaml file enty to simply open multiple zarr files...
The package versions I am using are: