Open rsignell-usgs opened 1 year ago
For kerchunked datasets recipes, the currently generated intake catalogs don't work because the OSN endpoint_url is not included. For example, for the NWM-2.1-grid1km-LDAS recipe, we get:
sources: data: args: chunks: {} consolidated: false storage_options: fo: Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json remote_options: anon: true remote_protocol: s3 skip_instance_cache: true target_options: {} target_protocol: s3 urlpath: reference:// description: '' driver: intake_xarray.xzarr.ZarrSource
but the fo doesn't work as a remote_protocol: s3 for OSN because the endpoint_url is not specified.
fo
remote_protocol: s3
endpoint_url
Two solutions:
target_protocol: s3
client_kwarg
target_protocol: https
These solutions both work:
sources: data: driver: intake_xarray.xzarr.ZarrSource description: '' args: urlpath: "reference://" consolidated: false storage_options: target_options: anon: true client_kwargs: {'endpoint_url': 'https://ncsa.osn.xsede.org'} fo: 's3://Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json' remote_options: anon: true remote_protocol: "s3"
sources: data: args: chunks: {} consolidated: false storage_options: fo: 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1393/NWM-2.1-grid1km-LDAS.zarr/reference.json' remote_options: anon: true remote_protocol: s3 skip_instance_cache: true target_options: {} urlpath: reference:// description: '' driver: intake_xarray.xzarr.ZarrSource
The relevant code is at https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/pangeo_forge_recipes/recipes/reference_hdf_zarr.py#L77-L83
@sharkinsspatial is this something you can fix?
Thanks for spotting this issue and documenting it here, @rsignell-usgs!
For kerchunked datasets recipes, the currently generated intake catalogs don't work because the OSN endpoint_url is not included. For example, for the NWM-2.1-grid1km-LDAS recipe, we get:
but the
fo
doesn't work as aremote_protocol: s3
for OSN because theendpoint_url
is not specified.Two solutions:
target_protocol: s3
, but add specify target_options that includeendpoint_url
as aclient_kwarg
.target_protocol: https
, and specifyfo
with the https pathThese solutions both work:
Solution 1:
Solution 2:
The relevant code is at https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/pangeo_forge_recipes/recipes/reference_hdf_zarr.py#L77-L83
@sharkinsspatial is this something you can fix?