linjonathan / tropical_cyclone_risk

A Physics-Based, Tropical Cyclone Downscaling Model
MIT License
23 stars 11 forks source link

CMIP6 time format problem #1

Closed levuvietphong closed 1 year ago

levuvietphong commented 1 year ago

Hi @linjonathan,

I got an error when I ran the model for downscaling GFDL-CM4 and other CMIP6 models:

File "/home/pvn/.conda/envs/tc_risk/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 5820, in get_slice_bound
    f"Cannot get {side} slice bound for non-unique "
KeyError: 'Cannot get left slice bound for non-unique label: cftime.DatetimeNoLeap(2021, 4, 16, 0, 0, 0, 0, has_year_zero=True)'

It looks like that there is a problem in pandas to read the time format used in CMIP6 (cftime). The xarray loads the netcdf files properly. Do you have any recommendation to fix this bug?

Thanks.

linjonathan commented 1 year ago

According to https://tedboy.github.io/pandas/gotchas/gotchas4.html, this error happens when the label is not unique within the dataset. So, this makes me think that the "cftime.DatetimeNoLeap(2021, 4, 16, 0, 0, 0, 0, has_year_zero=True)" label occurs multiple times within the dataset. Do you have other files in that directory that are being mistakenly read by open_mfdataset? You can check out which files are being read by the "_glob_prefix" function in util/input.py.

In the readme, the file input reading convention is: namelist.exp_prefixVAR*.nc

So if you have duplicate files or a different experiment that fits the namelist.exp_prefix matching, then you could run into this issue.

levuvietphong commented 1 year ago

I think you are right. Other input files from other experiments may duplicate the time labels. I removed all the backup/redundant NetCDF files, and the model runs properly now.

Thanks.