roocs / dachar

BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Fix data being extracted with different Time types #3

Closed agstephens closed 4 years ago

agstephens commented 4 years ago

Here is an interesting xarray/netcdf issue:

import xarray as xr

nc_files = ['/badc/cmip5/data/cmip5/output1/MIROC/MIROC-ESM/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_MIROC-ESM_rcp45
_r1i1p1_200601-210012.nc', '/badc/cmip5/data/cmip5/output1/MIROC/MIROC-ESM/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga/zostoga_Omon_MIRO
C-ESM_rcp45_r1i1p1_210101-230012.nc']

ds = xr.open_mfdataset(nc_files)
da = ds['zostoga']
tm = da.coords['time']

tm.max()

Error is:

E   TypeError: Cannot compare type 'Timestamp' with type 'DatetimeGregorian'

Possible fixes:

ellesmith88 commented 4 years ago

Testing using:

nc_files = [ '/badc/cmip5/data/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga' '/zostoga_Omon_MPI-ESM-LR_rcp45_r1i1p1_200601-210012.nc', '/badc/cmip5/data/cmip5/output1/MPI-M/MPI-ESM-LR/rcp45/mon/ocean/Omon/r1i1p1/latest/zostoga' '/zostoga_Omon_MPI-ESM-LR_rcp45_r1i1p1_210101-230012.nc']

with error:

Exception was: Cannot compare type 'Timestamp' with type 'DatetimeProlepticGregorian' as I don't have permission for MIROC.

ellesmith88 commented 4 years ago

Reason for the error, from xarray website:

One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to those that fall between the years 1678 and 2262. When a netCDF file contains dates outside of these bounds, dates will be returned as arrays of cftime.datetime objects and a CFTimeIndex will be used for indexing.

The second file is out of this date range while the first one isn't so an error occurs.

agstephens commented 4 years ago

@ellesmith88 , good investigation work. Can we force it to use the cftime objects in all cases?

Also, please sign up to the "cmip5_research" role with your CEDA account in order to access the MIROC (and other) data.

ellesmith88 commented 4 years ago

This can be fixed using ds = xr.open_mfdataset(nc_files, use_cftime=True, combine='by_coords') which needs xarray version 0.15. This gives a max as a cftime which can then be converted using .strftime('%Y-%m-%dT%H:%M:%S'). Detailed in test_scan.py.

agstephens commented 4 years ago

Thanks @ellesmith88, have you been able to test it out? By installing an updated xarray in a venv?

I've added requirements.txt file to the repo, with a comment.

agstephens commented 4 years ago

@ellesmith88: Given that we have greatly varying timescales within the data, I suspect we should always do use_cftime=True, what do you think?

ellesmith88 commented 4 years ago

@agstephens Yes, tested and produced a json file just to check. Seems like the best idea, that way you know what you're getting.

Also with xarray version 0.15, open_mfdataset requires the combine argument, but is unknown to our current version 0.11.

agstephens commented 4 years ago

Done.