Closed Timh37 closed 4 months ago
The way I have the recipe setup currently should already expand these out if available.
Checking the recent logs confirms that the first iid here is indeed parsed out. Diagnosing what happens on the pangeo-forge-esgf side (gist) indicates that we can find urls for the following subset of iids requested here:
['CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r6i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.pr.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r10i1p1f1.day.pr.gn.v20210318']
but they are clearly not in the catalog... hmmm, this is strange.
I have made a dummy PR to get a bit more manageable logs for these particular iids.
Ok so the culprit here is that these particular iids are being removed, since they already exist in the 'original' catalog.
Could you double check if those stores are useable? If not we could remove them from the original catalog which should trigger a rebuild here.
I can load these datasets from the 'original' catalog, but the datasets get filtered out because they are incomplete (at least the one I checked just now, CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317
), possibly due to #53? That may mean that even if a complete dataset for this iid would be added through pangeo-forge-esgf, it may be removed?
I just ran
import intake
def zstore_to_iid(zstore: str):
# this is a bit whacky to account for the different way of storing old/new stores
iid = '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-11:-1])
if not iid.startswith('CMIP6'):
iid = '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-10:])
return iid
def search_iids(col_url:str):
col = intake.open_esm_datastore(col_url)
iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
return [iid for iid in iids_all if iid in iids_requested]
iids_requested = [
'CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r6i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.pr.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r10i1p1f1.day.pr.gn.v20210318',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r2i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r3i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r4i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r5i1p1f1.day.pr.gn.v20190603',
]
url_dict = {
'qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json",
'non-qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_noqc.json",
'retracted':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_retracted.json"
}
iids_found = []
for catalog,url in url_dict.items():
iids = search_iids(url)
iids_found.extend(iids)
print(f"Found in {catalog=}: {iids=}\n")
missing_iids = list(set(iids_requested) - set(iids_found))
print(f"\n\nStill missing {len(missing_iids)} of {len(iids_requested)}: \n{missing_iids=}")
And get that all datasets are available 🎉.
Some of them landed in the retracted catalog though. Let me know if you want to add newer versions.
List of requested idds
Description
Kind request to add a few final iid's before submitting the compound flooding paper