rb4844 commented 4 months ago

I am trying to identify which CMIP6 models at NCI have 3hr data, without manually trawling through them all.

Something like this:

import intake cmip6 = intake.open_esm_datastore("/g/data/dk92/catalog/v2/esm/cmip6-oi10/catalog.json") values_dict = cmip6.unique() models_list = values_dict.source_id

but take a subset for testing

small_list = ['EC-Earth3', 'CESM2', 'GFDL-ESM2M', 'GFDL-ESM4', 'MIROC6', 'NorESM2-MM', 'CMCC-ESM2']

variables of interest

three_hr_data = ['huss', 'tas', 'uas', 'vas', 'ps', 'pr']

testing with one model

cat_subset = cmip6.search( source_id=['EC-Earth3'], experiment_id=["ssp370"], table_id="3hr", variable_id="huss", grid_label=["gn", 'gr'], )

print(cat_subset)

dset_dict = cat_subset.to_dataset_dict( xarray_open_kwargs={"consolidated": True, "decode_times": True, "use_cftime": True} )

Which then fails with this error:

ESMDataSourceError: Failed to load dataset with key='f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201' You can use cat['f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r6i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'].df to inspect the assets/files for this key.

Why does it say ‘3hrPt.atmos.3hr’ ? Not what I was expecting. Actual path is /g/data/oi10/replicas/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp370/r1i1p1f1/3hr/huss/gr/v20200310

Can you suggest how to get this to work in a useful way? At the end of the day, I want a list of paths similar to the above, for all the models with 3hr data in the three_hr_data list (there aren't very many). Many thanks, Roger

Thomas-Moore-Creative commented 4 months ago

@rb4844 - starting by running the code you supplied. One minor comment:

the output from`cat_subset` is more useful that `print(cat_subset)`

CleanShot 2024-07-29 at 10 36 05@2x

Thomas-Moore-Creative commented 4 months ago

@rb4844 - when I try to run your .to_dataset_dict() code I do get failure and error but it specifies:

Exception: 'TypeError("NetCDF4BackendEntrypoint.open_dataset() got an unexpected keyword argument \'consolidated\'")'

Are you intending to use xarray_open_kwargs={"consolidated": True}. Are we pointing at netCDF files? Is not the consolidated metadata keyword for the zarr engine only?

I'm not sure if this is THE problem or the ONLY problem?

rb4844 commented 4 months ago

Sorry, not familiar with that keyword argument, only used it as copying another example. Yes, pointing at netCDF files. So maybe remove or change that argument?

Thomas-Moore-Creative commented 4 months ago

Sorry, not familiar with that keyword argument, only used it as copying another example. Yes, pointing at netCDF files. So maybe remove or change that argument?

All good! I myself need to check the documentation very regularly. Possibly have a look at the docs that support xarray.open_mfdataset and give it a go.

rb4844 commented 4 months ago

removing the xarray_open_kwargs still results in the error:

ESMDataSourceError: Failed to load dataset with key='f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r13i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201' You can use cat['f.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r13i1p1f1.3hrPt.atmos.3hr.huss.gr.v20200201'].df to inspect the assets/files for this key.

and the name here looks odd, with '3hrPt.atmos'?

Thomas-Moore-Creative commented 4 months ago

resolved - https://forum.access-hive.org.au/t/intake-esm-and-3hr-data/2259

shared-climate-data-problems / CMIP-data-problems

Using Intake-ESM #5

but take a subset for testing

variables of interest

testing with one model

the output from`cat_subset` is more useful that `print(cat_subset)`

shared-climate-data-problems / CMIP-data-problems

Using Intake-ESM #5

but take a subset for testing

variables of interest

testing with one model

the output fromcat_subset is more useful that print(cat_subset)

the output from`cat_subset` is more useful that `print(cat_subset)`