Closed rb4844 closed 4 months ago
@rb4844 - starting by running the code you supplied. One minor comment:
cat_subset
is more useful that print(cat_subset)
@rb4844 - when I try to run your .to_dataset_dict()
code I do get failure and error but it specifies:
Exception: 'TypeError("NetCDF4BackendEntrypoint.open_dataset() got an unexpected keyword argument \'consolidated\'")'
Are you intending to use xarray_open_kwargs={"consolidated": True}
. Are we pointing at netCDF
files? Is not the consolidated
metadata keyword for the zarr
engine only?
I'm not sure if this is THE problem or the ONLY problem?
Sorry, not familiar with that keyword argument, only used it as copying another example. Yes, pointing at netCDF files. So maybe remove or change that argument?
Sorry, not familiar with that keyword argument, only used it as copying another example. Yes, pointing at netCDF files. So maybe remove or change that argument?
All good! I myself need to check the documentation very regularly. Possibly have a look at the docs that support xarray.open_mfdataset
and give it a go.
removing the xarray_open_kwargs still results in the error:
ESMDataSourceError: Failed to load dataset with key='f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r13i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'
You can use cat['f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r13i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'].df
to inspect the assets/files for this key.
and the name here looks odd, with '3hrPt.atmos'?
I am trying to identify which CMIP6 models at NCI have 3hr data, without manually trawling through them all.
Something like this:
import intake cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-oi10/catalog.json") values_dict = cmip6.unique() models_list = values_dict.source_id
but take a subset for testing
small_list = ['EC-Earth3', 'CESM2', 'GFDL-ESM2M', 'GFDL-ESM4', 'MIROC6', 'NorESM2-MM', 'CMCC-ESM2']
variables of interest
three_hr_data = ['huss', 'tas', 'uas', 'vas', 'ps', 'pr']
testing with one model
cat_subset = cmip6.search( source_id=['EC-Earth3'], experiment_id=["ssp370"], table_id="3hr", variable_id="huss", grid_label=["gn", 'gr'], )
print(cat_subset)
dset_dict = cat_subset.to_dataset_dict( xarray_open_kwargs={"consolidated": True, "decode_times": True, "use_cftime": True} )
Which then fails with this error:
ESMDataSourceError: Failed to load dataset with key='f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201' You can use
cat['f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'].df
to inspect the assets/files for this key.Why does it say ‘3hrPt.atmos.3hr’ ? Not what I was expecting. Actual path is /g/data/oi10/replicas/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp370/r1i1p1f1/3hr/huss/gr/v20200310
Can you suggest how to get this to work in a useful way? At the end of the day, I want a list of paths similar to the above, for all the models with 3hr data in the three_hr_data list (there aren't very many). Many thanks, Roger