Open mvichi opened 3 weeks ago
Hi @mvichi thanks for using the cloud data. I just started https://github.com/leap-stc/cmip6-leap-feedstock/pull/181 as a test and will run the full thing as soon as the PR succeeds! I am very busy this week, but this should squeeze in between other tasks and is related to my work all week. So please feel free to ping me here or via email (julius@ldeo.columbia.edu) in the likely case that I forget to move on this. I am motivated to get as much data up as possible for your deadline.
Seems like we are getting only 6 datasets from the ESGF API right now. Is that useful to ingest already? Happy to rerun things a few times and hope for better availability! EDIT: This was my bad. I did not allow all member_id s. Lets see how many we get now!
Ok this looks better:
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r3i1p2f1.SImon.sivol.gn.v20190215',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.siconc.gn.v20190314',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i3p1f1.SImon.siconc.gn.v20190710',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r1i1p1f2.SImon.siconc.gn.v20190401',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.sivol.gn.v20190314',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r6i1p1f1.SImon.siconc.gn.v20181119',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r8i1p1f1.SImon.siconc.gn.v20190425',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-LL.hist-1950.r1i5p1f1.SImon.siconc.gn.v20190418',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170915',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r2i1p1f2.SImon.sivol.gn.v20200615',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r3i1p1f1.SImon.siconc.gn.v20181119',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-HR4.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200917',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HM.hist-1950.r1i3p1f1.SImon.siconc.gn.v20190710',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r3i1p1f2.SImon.sivol.gn.v20200615',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r3i1p1f1.SImon.siconc.gn.v20181119',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r3i1p1f2.SImon.siconc.gn.v20200224',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i2p1f1.SImon.siconc.gn.v20190710',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r2i1p2f1.SImon.siconc.gn.v20190812',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r2i1p2f1.SImon.sivol.gn.v20190812',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i2p1f1.SImon.sivol.gn.v20190710'
Seem to be available right now. Not all you requested, but Ill try to run these now and we can rerun later.
Seeing a few errors for unavailable files (hopefully these resolve over time), but also a bunch of successful jobs already. Ill check in in a bit and give you a report for now.
Ok will need to change gear and work on something else for now, but lets continue here soon.
So I followed the instructions to check which datasets were uploaded here and got:
Found in catalog='qc': iids=['CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.SImon.sivol.gn.v20210416', 'CMIP6.HighResMIP.CMCC.CMCC-CM2-HR4.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200917', 'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r1i1p1f2.SImon.sivol.gn.v20190401', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HM.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180730', 'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.siconc.gn.v20190314', 'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P-HR.hist-1950.r1i1p2f1.SImon.sivol.gn.v20181212', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HM.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180730', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-LL.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170921', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170928', 'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200917', 'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P-HR.hist-1950.r1i1p2f1.SImon.siconc.gn.v20181212', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-LL.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170921', 'CMIP6.HighResMIP.CMCC.CMCC-CM2-HR4.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200917', 'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r1i1p1f2.SImon.siconc.gn.v20190401', 'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r1i1p1f2.SImon.siconc.gn.v20190221', 'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20190221', 'CMIP6.HighResMIP.NOAA-GFDL.GFDL-CM4C192.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180701', 'CMIP6.HighResMIP.NOAA-GFDL.GFDL-CM4C192.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180701']
Found in catalog='non-qc': iids=['CMIP6.HighResMIP.ECMWF.ECMWF-IFS-MR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20181119', 'CMIP6.HighResMIP.AWI.AWI-CM-1-1-LR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20170825', 'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180606', 'CMIP6.HighResMIP.AWI.AWI-CM-1-1-HR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20170825', 'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170915', 'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-XR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180606', 'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-MR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20181119', 'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-XR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180606', 'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180221', 'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180221', 'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180606']
Found in catalog='retracted': iids=[]
Still missing 11 of 40:
missing_iids=['CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200810', 'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200810', 'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.sivol.gn.v20190314', 'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200921', 'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.SImon.siconc.gn.v20210416', 'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170928', 'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-LR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200812', 'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200921', 'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170915', 'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-LR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200812', 'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200917']
Seems like we got 10+ uploaded and tested! There are quite a few that fail our tests (the non-qc) catalog. If you or the student have some time to look into what might be wrong with those datasets (follow the instructions here to access the non-qc datasets) that would be very helpful. Perhaps we can fix the issues. For the 11 ones that are still missing, I would recommend that we rerun the ingestion a couple of times and see if this is just due to flaky data nodes.
Thank you, Julius, that was incredibly quick. I am travelling right now, and I'll be back to work next week. We'll report back on their status and quality asap. We appreciate very much your prompt reaction!
Running the pipeline once again just to see if we catch some more. Getting closer:
Still missing 4 of 40:
missing_iids=['CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200810', 'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200921', 'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-LR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200812', 'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200921'
Running again to see if we can get the last hold outs to ingest. @mvichi did you get a chance to test the newly ingested data?
We tested all the available data, and most of them work, thanks! Some of them fail the xmip preprocessing and some other crashes for other reasons, but the data integrity seems good. Thank you very much again for adding the data so quickly. We will make it available as a cookbook once completed. I'll share it through discourse, so that you can decide
Awesome. If you could raise issues over at xMIP I can take a look at what is going on once some time frees up!
I am also still seeing 2 missing datasets on my end:
import intake
def zstore_to_iid(zstore: str):
# this is a bit whacky to account for the different way of storing old/new stores
iid = '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-11:-1])
if not iid.startswith('CMIP6'):
iid = '.'.join(zstore.replace('gs://','').replace('.zarr','').replace('.','/').split('/')[-10:])
return iid
def search_iids(col_url:str):
col = intake.open_esm_datastore(col_url)
iids_all= [zstore_to_iid(z) for z in col.df['zstore'].tolist()]
return [iid for iid in iids_all if iid in iids_requested]
iids_requested = [
'CMIP6.HighResMIP.AWI.AWI-CM-1-1-HR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20170825',
'CMIP6.HighResMIP.AWI.AWI-CM-1-1-LR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20170825',
'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200921',
'CMIP6.HighResMIP.BCC.BCC-CSM2-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200921',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-HR4.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200917',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-HR4.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200917',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200917',
'CMIP6.HighResMIP.CMCC.CMCC-CM2-VHR4.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200917',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r1i1p1f2.SImon.siconc.gn.v20190221',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1-HR.hist-1950.r1i1p1f2.SImon.sivol.gn.v20190221',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r1i1p1f2.SImon.siconc.gn.v20190401',
'CMIP6.HighResMIP.CNRM-CERFACS.CNRM-CM6-1.hist-1950.r1i1p1f2.SImon.sivol.gn.v20190401',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P-HR.hist-1950.r1i1p2f1.SImon.siconc.gn.v20181212',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P-HR.hist-1950.r1i1p2f1.SImon.sivol.gn.v20181212',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.siconc.gn.v20190314',
'CMIP6.HighResMIP.EC-Earth-Consortium.EC-Earth3P.hist-1950.r1i1p2f1.SImon.sivol.gn.v20190314',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170915',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170915',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180221',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-LR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180221',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-MR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20181119',
'CMIP6.HighResMIP.ECMWF.ECMWF-IFS-MR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20181119',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HM.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180730',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HM.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180730',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-LL.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170921',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-LL.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170921',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i1p1f1.SImon.siconc.gn.v20170928',
'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-MM.hist-1950.r1i1p1f1.SImon.sivol.gn.v20170928',
'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180606',
'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180606',
'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-XR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180606',
'CMIP6.HighResMIP.MPI-M.MPI-ESM1-2-XR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180606',
'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-HR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200810',
'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-HR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200810',
'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-LR.hist-1950.r1i1p1f1.SImon.siconc.gn.v20200812',
'CMIP6.HighResMIP.NCAR.CESM1-CAM5-SE-LR.hist-1950.r1i1p1f1.SImon.sivol.gn.v20200812',
'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.SImon.siconc.gn.v20210416',
'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.SImon.sivol.gn.v20210416',
'CMIP6.HighResMIP.NOAA-GFDL.GFDL-CM4C192.hist-1950.r1i1p1f1.SImon.siconc.gn.v20180701',
'CMIP6.HighResMIP.NOAA-GFDL.GFDL-CM4C192.hist-1950.r1i1p1f1.SImon.sivol.gn.v20180701',
]
url_dict = {
'qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json",
'non-qc':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_noqc.json",
'retracted':"https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog_retracted.json"
}
iids_found = []
for catalog,url in url_dict.items():
iids = search_iids(url)
iids_found.extend(iids)
print(f"Found in {catalog=}: {iids=}\n")
missing_iids = list(set(iids_requested) - set(iids_found))
print(f"\n\nStill missing {len(missing_iids)} of {len(iids_requested)}: \n{missing_iids=}")
So ill leave this open for now, unless you think we can close this.
In any case, please make sure to cite the original CMIP6 data sources and if you could acknowledge our efforts here (https://zenodo.org/badge/latestdoi/618127503) too that would help a lot. Cheers.
List of requested idds
Description
Hi guys, thanks a lot for your effort and for continuously improving the system We recently run an analysis on the HighResMIP output to assess the performance of sea ice simulations in the northern and southern hemisphere. The work was done with the "download model" and it was published in the two papers below (Selivanova et al., 2024a,b). An MSc student at the University of Cape Town is currently adapting the SItool (Lin et al., 2021) to work with Pangeo, and we would also like to add the assessment of the HighResMIP. The student is currently testing the system with the low-res CMIP6 models, and it would be great to add the HighResMIP. The thesis should be submitted in February 2025, but the analysis should be ideally completed before the end of 2024. Thanks in advance, Marcello
Lin, X., Massonnet, F., Fichefet, T., Vancoppenolle, M., 2021. SITool (v1.0) – a new evaluation tool for large-scale sea ice simulations: application to CMIP6 OMIP. Geoscientific Model Development 14, 6331–6354. https://doi.org/10.5194/gmd-14-6331-2021 Selivanova, J., Iovino, D., Cocetta, F., 2024a. Past and future of the Arctic sea ice in High-Resolution Model Intercomparison Project (HighResMIP) climate models. The Cryosphere 18, 2739–2763. https://doi.org/10.5194/tc-18-2739-2024 Selivanova, J., Iovino, D., Vichi, M., 2024b. Limited Benefits of Increased Spatial Resolution for Sea Ice in HighResMIP Simulations. Geophysical Research Letters 51, e2023GL107969. https://doi.org/10.1029/2023GL107969