intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
https://intake-esm.readthedocs.io
Apache License 2.0
138 stars 47 forks source link

Executing script twice error: registry #599

Closed krober10nd closed 1 year ago

krober10nd commented 1 year ago

Hi all,

I was able to run this script once and it produced netcdf files as one would expect when it's run again (after deleting the NetCDF files in that directory) I receive the following error:

Python: 3.10.6 intake-esm: 2023.4.20

Any idea why this may be? I can't find any information on how to set the registry clobber to False.

Thanks

import intake

col = intake.open_esm_datastore(
    "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)
#print(search.df['source_id'].unique())
variable_ids = ['zg','tas']
cat = col.search(
    source_id='CNRM-CM6-1-HR',
    experiment_id='ssp585',
    variable_id=variable_ids,
    table_id='Amon',
)
ds_dict = cat.to_dataset_dict()
# Save each variable to a unique file
for idx, (var, ds) in enumerate(ds_dict.items()):
    filename = f"{variable_ids[idx]}_{var}.nc"
    ds_dict[var].to_netcdf(filename)

Producing this essentially:

    raise ValueError(
ValueError: Name (gs) already in the registry and clobber is False

The above exception was the direct cause of the following exception:

Full error is here:

bin/python3 download_amon_gcm_data_for_pgw.py

--> The keys in the returned dictionary of datasets are constructed as follows:
        'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
Traceback (most recent call last):
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/source.py", line 238, in _open_dataset
    datasets = dask.compute(*datasets)
  File "/home/krober/.local/lib/python3.10/site-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/home/krober/.local/lib/python3.10/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/home/krober/.local/lib/python3.10/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/home/krober/.local/lib/python3.10/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/home/krober/.local/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/krober/.local/lib/python3.10/site-packages/dask/utils.py", line 73, in apply
    return func(*args, **kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/source.py", line 72, in _open_dataset
    ds = xr.open_dataset(url, **xarray_open_kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/xarray/backends/api.py", line 526, in open_dataset
    backend_ds = backend.open_dataset(
  File "/home/krober/.local/lib/python3.10/site-packages/xarray/backends/zarr.py", line 891, in open_dataset
    store = ZarrStore.open_group(
  File "/home/krober/.local/lib/python3.10/site-packages/xarray/backends/zarr.py", line 405, in open_group
    zarr_group = zarr.open_consolidated(store, **open_kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/zarr/convenience.py", line 1282, in open_consolidated
    store = normalize_store_arg(store, storage_options=kwargs.get("storage_options"), mode=mode,
  File "/home/krober/.local/lib/python3.10/site-packages/zarr/storage.py", line 181, in normalize_store_arg
    return normalize_store(store, storage_options, mode)
  File "/home/krober/.local/lib/python3.10/site-packages/zarr/storage.py", line 154, in _normalize_store_arg_v2
    return FSStore(store, mode=mode, **(storage_options or {}))
  File "/home/krober/.local/lib/python3.10/site-packages/zarr/storage.py", line 1345, in __init__
    self.map = fsspec.get_mapper(url, **{**mapper_options, **storage_options})
  File "/home/krober/.local/lib/python3.10/site-packages/fsspec/mapping.py", line 237, in get_mapper
    fs, urlpath = url_to_fs(url, **kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/fsspec/core.py", line 363, in url_to_fs
    chain = _un_chain(url, kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/fsspec/core.py", line 325, in _un_chain
    cls = get_filesystem_class(protocol)
  File "/home/krober/.local/lib/python3.10/site-packages/fsspec/registry.py", line 216, in get_filesystem_class
    register_implementation(protocol, _import_class(bit["class"]))
  File "/home/krober/.local/lib/python3.10/site-packages/fsspec/registry.py", line 48, in register_implementation
    raise ValueError(
ValueError: Name (gs) already in the registry and clobber is False

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/c/Users/kroberts/Projects/13783.101_Pax/03_Data/03_GCMs/AMON/download_amon_gcm_data_for_pgw.py", line 14, in <module>
    ds_dict = cat.to_dataset_dict()
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
    import itertools
  File "pydantic/decorator.py", line 134, in pydantic.decorator.ValidatedFunction.call
    if doc:
  File "pydantic/decorator.py", line 206, in pydantic.decorator.ValidatedFunction.execute
    name, rest = obj.strip().split('(', 1)
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/core.py", line 662, in to_dataset_dict
    raise exc
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/core.py", line 658, in to_dataset_dict
    key, ds = task.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/core.py", line 800, in _load_source
    return key, source.to_dask()
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/source.py", line 271, in to_dask
    self._load_metadata()
  File "/home/krober/.local/lib/python3.10/site-packages/intake/source/base.py", line 279, in _load_metadata
    self._schema = self._get_schema()
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/source.py", line 203, in _get_schema
    self._open_dataset()
  File "/home/krober/.local/lib/python3.10/site-packages/intake_esm/source.py", line 263, in _open_dataset
    raise ESMDataSourceError(
intake_esm.source.ESMDataSourceError: Failed to load dataset with key='ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1-HR.ssp585.Amon.gr'
                 You can use `cat['ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1-HR.ssp585.Amon.gr'].df` to inspect the assets/files for this key.
mgrover1 commented 1 year ago

Thanks for the question @krober10nd and I apologize for the delayed response.

You should be able to set that when you read in the dataset, using the storage_options argument!

catalog.to_dataset_dict(storage_options={'anon':True, 'clobber':False})
krober10nd commented 1 year ago

Thank you for the response! It has been working great once I got past some initial issues.