leap-stc / CMIP6-pCO2-testbed

Using the CMIP6 data to create a pCO2 testbed
Apache License 2.0
3 stars 1 forks source link

errors when trying to regrid #2

Open hatlenheimdalthea opened 11 months ago

hatlenheimdalthea commented 11 months ago

notebook "regrid_members" in this repo

ValueError: Dimensions {'lev'} do not exist. Expected one or more of Frozen({'y': 291, 'x': 360, 'time': 1032, 'vertex': 4, 'bnds': 2})

jbusecke commented 11 months ago

I think you need an additional check here:

def full_testbed_processing(ds: xr.Dataset) -> xr.Dataset:
    ds = ds.squeeze(drop=True)
    # select surface depth (for chl, TODO: Check if surface chlorophyll is available)
    ds = ds.isel(lev=0).drop('lev')

    ds = ds.sel(time=slice('1850', '2100'))

    # testing
    assert len(ds.time) == 3012
    assert ds.time.data[0].year == 1850

    # Processing
    ds_regridded = regrid(ds)
    ds_new_cal = replace_calendar(ds_regridded)

    return ds_new_cal

this line:

    ds = ds.isel(lev=0).drop('lev')

should probably be something like:

    if 'lev' in ds.dims:
        ds = ds.isel(lev=0).drop('lev')

then this should be applicable to both 3d variables (chl) and surface ones (e.g. sos). Alternatively we could ingest the surface chlorophyll data (I think its chlos, but please double check!) and remove that line alltogether.

hatlenheimdalthea commented 11 months ago

I don't understand why everything ran smoothly before with the exact same code (see all 18 members here: path = 'gs://leap-persistent/hatlenheimdalthea/testing'). Have I accidentally deleted some code or something? Anyway, I tried both solutions and I get the same error:


AssertionError Traceback (most recent call last) Cell In[7], line 4 2 for k,ds in ddict.items(): 3 print(f"Processing {k}") ----> 4 ds_out = full_testbed_processing(ds) 6 ds_id = cmip6_dataset_id(ds_out, id_attrs=[ 7 'source_id', 8 'variant_label', (...) 11 'version', 12 ]) 13 save_path = f"gs://leap-scratch/jbusecke/pco2-testing/{ds_id}"

Cell In[6], line 34, in full_testbed_processing(ds) 31 ds = ds.sel(time=slice('1850', '2100')) 33 # testing ---> 34 assert len(ds.time) == 3012 35 assert ds.time.data[0].year == 1850 37 # Processing

AssertionError:

jbusecke commented 11 months ago

Seems like that dataset does not have the expected number of timesteps?

jbusecke commented 11 months ago

You could do something like:

for name, ds in ds_dict:
    try:
        full_testbed_processing()
        ...
    except Exception as e:
        print(f"{name} failed with {e}")   

This would continue to process later datasets and then you get a printed list of the problematic datasets (which maybe you can fix).

Side note: Check if those particular runs are going beyond 2100! Then it would be as easy as adding a

ds = ds.sel(None,'2100')

To the 'full_testbed_processing' function.