pangeo-data / pangeo-cmip6-cloud

Documentation for Pangeo CMIP6 data stored in GCP/AWS cloud
17 stars 9 forks source link

CIL/Rhodium CMIP6 Dataset Requests #38

Closed cisaacstern closed 2 years ago

cisaacstern commented 2 years ago

@jbusecke @kemccusker @rfofrich @delgadom @dgergel

Let's start by creating a list of DATASET_IDs which can be run with

DATASET_ID, as defined in #31 is:


When Pangeo Forge Cloud is ready, I will ping this thread with ideas for migrating this work there.

cisaacstern commented 2 years ago

If CIL/Rhodium team can provide @jbusecke with one DATASET_ID to start with, he can run the script to see if it will work for these datasets.

cisaacstern commented 2 years ago

Here is a link to the tutorial for running recipes locally:

Please let me know if anything is unclear.

delgadom commented 2 years ago

I got the script to run with the suggested DATASET_ID ""

I just grabbed a CMIP/historical sim we've worked with: This is already in the pangeo cloud store here: gs://cmip6/CMIP6/CMIP/CMCC/CMCC-CM2-SR5/historical/r1i1p1f1/day/tasmax/gn/v20200616

I don't have one of the no-anthro forcing specs handy, but this should be similar to the ones we'd want to use. When I run this, I get the errors parsing the ESGF API response:

$ python
empty search response
Traceback (most recent call last):
  File "/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexes/", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'url'

As far as I can tell, ESGF returns no results for the query that this call builds:

I've just done a tiny bit of poking around... as far as I can tell the culprit is the type=File on keyword. If you change this to type=Dataset, the query does return a listing which looks right to me, but the parser then chokes on because there isn't a dataset_id in the response.

Wondering if digging into this is productive or if you've already solved this issue a different way?

delgadom commented 2 years ago

huh... ok well @rfofrich just sent me a sample ID from the DAMIP experiments we're trying to use and this one worked! I'm still confused about the above example but it's not a current pain point....

Here's the DATASET_ID:

this ran the workflow locally for me (using the patch in #39) when testing using


@jbusecke does this give you enough to start with? or should we work on a full list?

delgadom commented 2 years ago

For @rfofrich (and anyone else wanting to run this package) - here's my quickstart for testing a recipe:

  1. clone this repo

  2. [until #39 is merged] delete lines 96-110 from - you should remove all of the following:

    fs_local = LocalFileSystem()
    target_dir = tempfile.TemporaryDirectory().name + ".zarr"
    target = FSSpecTarget(fs_local, target_dir)
    cache_dir = tempfile.TemporaryDirectory()
    cache_target = CacheFSSpecTarget(fs_local,
    meta_dir = tempfile.TemporaryDirectory()
    meta_store = MetadataTarget(fs_local, = target
    recipe.input_cache = cache_target
    recipe.metadata_cache = meta_store

    Also change the print from print(target_dir) to print(

  3. change the execution line so it only runs the sample "pruned" workflow:

    # recipe.to_function()()
  4. install the dependencies. one of the conda environments on pangeo-forge-recipes seems like a good place to catch 'em all. You'll also need pangeo-forge-recipes itself: pip install pangeo-forge-recipes

  5. Finally, run, passing in your DATASET_ID as a positional argument, e.g.:

rfofrich commented 2 years ago

@cisaacstern @jbusecke I think we have what we need to move forward with this. I'm attaching an excel file with all the DAMIP models/simulations needed for the project. Each column of the excel sheet has the necessary information to construct a DATASET_ID for that model/ensemble member. Let me know if you have any questions/concerns or if any simulation gives you any issues. CMIP6_DAMIP_hist_nat_temp.xlsx

cisaacstern commented 2 years ago

Thanks @rfofrich. Julius and I have some time scheduled to look at this together on Friday. We'll update you here once we've been able to make some headway.

rfofrich commented 2 years ago

Sounds great! Thank you both.

rfofrich commented 2 years ago

@cisaacstern Hello, thanks again for helping with this. Just wanted to circle back and see if there were any updates.

cisaacstern commented 2 years ago

@rfofrich, thanks for checking in and apologies for the delayed reply. @jbusecke and I have migrated this work to I realize it's a bit redundant, but just so we have everything in one place, could I ask you to open a new issue on that repository requesting we work on the list of IDs you provided in

A small point, but when you do so could you link the list of requested IDs as a GitHub Gist or similar form which is readable in-browser without download? (It will just be a bit easier to work with that way.)

I admit I'm not clear on what your preferred timeline is for this, so perhaps you could make a note of that in the new issue as well. Whether or not we, as a small team with a lot of other work on our plates, will be able to achieve that timeline is another matter of course, but once I know what it is, I'll certainly give you an honest assessment of that.

cisaacstern commented 2 years ago

I realize it's a bit redundant, but just so we have everything in one place, could I ask you to open a new issue on that repository requesting we work on the list of IDs you provided in

@rfofrich, I'm working on this today, so I just went ahead and created this new tracker issue:

To everyone following this thread: I'm going to close this Issue now. @jbusecke and I will provide future updates on this topic on the new issue linked above. Thanks so much for your engagement and enthusiasm. I expect we'll have some progress to share within another week or so.