leap-stc / cmip6-leap-feedstock

Apache License 2.0
12 stars 5 forks source link

[REQUEST]: a few more daily wind & precipitation iids #54

Closed Timh37 closed 4 months ago

Timh37 commented 12 months ago

List of requested idds

'CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r6i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.pr.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r10i1p1f1.day.pr.gn.v20210318',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r2i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r3i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r4i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r5i1p1f1.day.pr.gn.v20190603',

Description

Kind request to add a few final iid's before submitting the compound flooding paper

jbusecke commented 11 months ago

The way I have the recipe setup currently should already expand these out if available.

Checking the recent logs confirms that the first iid here is indeed parsed out. Diagnosing what happens on the pangeo-forge-esgf side (gist) indicates that we can find urls for the following subset of iids requested here:

['CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317',
 'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.sfcWind.gn.v20210318',
 'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r6i1p1f1.day.sfcWind.gn.v20210318',
 'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.pr.gn.v20210318',
 'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r10i1p1f1.day.pr.gn.v20210318']

but they are clearly not in the catalog... hmmm, this is strange.

jbusecke commented 11 months ago

I have made a dummy PR to get a bit more manageable logs for these particular iids.

jbusecke commented 11 months ago

Ok so the culprit here is that these particular iids are being removed, since they already exist in the 'original' catalog.

Updated gist.

Could you double check if those stores are useable? If not we could remove them from the original catalog which should trigger a rebuild here.

Timh37 commented 11 months ago

I can load these datasets from the 'original' catalog, but the datasets get filtered out because they are incomplete (at least the one I checked just now, CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317), possibly due to #53? That may mean that even if a complete dataset for this iid would be added through pangeo-forge-esgf, it may be removed?

jbusecke commented 4 months ago

I just ran

import intake

def zstore_to_iid(zstore: str):
    # this is a bit whacky to account for the different way of storing old/new stores
    iid =  '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-11:-1])
    if not iid.startswith('CMIP6'):
        iid =  '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-10:])
    return iid

def search_iids(col_url:str):
    col = intake.open_esm_datastore(col_url)
    iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
    return [iid for iid in iids_all if iid in iids_requested]

iids_requested = [
'CMIP6.ScenarioMIP.CSIRO-ARCCSS.ACCESS-CM2.ssp585.r1i1p1f1.day.pr.gn.v20210317',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r6i1p1f1.day.sfcWind.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r4i1p1f1.day.pr.gn.v20210318',
'CMIP6.ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.r10i1p1f1.day.pr.gn.v20210318',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r2i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r3i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r4i1p1f1.day.pr.gn.v20190603',
'CMIP6.CMIP.MRI.MRI-ESM2-0.historical.r5i1p1f1.day.pr.gn.v20190603',
]

url_dict = {
    'qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json",
    'non-qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_noqc.json",
    'retracted':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_retracted.json"
}

iids_found = []
for catalog,url in url_dict.items():
    iids = search_iids(url)
    iids_found.extend(iids)
    print(f"Found in {catalog=}: {iids=}\n")

missing_iids = list(set(iids_requested) - set(iids_found))
print(f"\n\nStill missing {len(missing_iids)} of {len(iids_requested)}: \n{missing_iids=}")

And get that all datasets are available 🎉.

Some of them landed in the retracted catalog though. Let me know if you want to add newer versions.