leap-stc / cmip6-leap-feedstock

Apache License 2.0
12 stars 5 forks source link

Web based 'Check for dataset' #103

Open jbusecke opened 6 months ago

jbusecke commented 6 months ago

Users commonly want to check if a particular iid x.y.z is in the catalog.

I currently end up spinning up a python kernel and check it manually, but I wonder if we could have some easy way to just have a website with a form that accepts a single iid or a list of iids

`x.y.z`
`xx.yy.zz`
`X.Y.Z'

And returns something like this:

Found in main catalog:
x.y.z - > gs://...

Found in non-QC catalog:
xx.yy.zz

Not found in any of the pangeo catalogs:
X.Y.Z
jbusecke commented 6 months ago

A simple way to check for requested iids:

EDIT: This accounts for both filepath/prefix conventions of old and new data!

def zstore_to_iid(zstore: str):
    return '.'.join(zstore.split('/')[3:-1])

iids_requested = [
    'CMIP6.CMIP.CSIRO-ARCCSS.ACCESS-CM2.historical.r4i1p1f1.3hr.pr.gn.v20210607',
]

import intake
# uncomment/comment lines to swap catalogs
url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json"
col = intake.open_esm_datastore(url)

iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
iids_uploaded = [iid for iid in iids_all if iid in iids_requested]
iids_uploaded
jbusecke commented 5 months ago

For now I have added instructions on how to check manually, so that users can keep track of requests.