pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

time decoding error with "days since" #521

Closed rabernat closed 9 years ago

rabernat commented 9 years ago

I am trying to use xray with some CESM POP model netCDF output, which supposedly follows CF-1.0 conventions. It is failing because the models time units are "'days since 0000-01-01 00:00:00". When calling open_dataset, I get the following error:

ValueError: unable to decode time units u'days since 0000-01-01 00:00:00' with the default calendar. Try opening your dataset with decode_times=False. Full traceback:
Traceback (most recent call last):
  File "/home/rpa/xray/xray/conventions.py", line 372, in __init__
    # Otherwise, tracebacks end up swallowed by Dataset.__repr__ when users
  File "/home/rpa/xray/xray/conventions.py", line 145, in decode_cf_datetime
    dates = _decode_datetime_with_netcdf4(flat_num_dates, units, calendar)
  File "/home/rpa/xray/xray/conventions.py", line 97, in _decode_datetime_with_netcdf4
    dates = np.asarray(nc4.num2date(num_dates, units, calendar))
  File "netCDF4/_netCDF4.pyx", line 4522, in netCDF4._netCDF4.num2date (netCDF4/_netCDF4.c:50388)
  File "netCDF4/_netCDF4.pyx", line 4337, in netCDF4._netCDF4._dateparse (netCDF4/_netCDF4.c:48234)
ValueError: year is out of range

Full metadata for the time variable:

    double time(time) ;
        time:long_name = "time" ;
        time:units = "days since 0000-01-01 00:00:00" ;
        time:bounds = "time_bound" ;
        time:calendar = "noleap" ;

I guess this is a problem with the underlying netCDF4 num2date package?

rabernat commented 9 years ago

In fact I just found a netCDF issue on this topic! Apparently they don't think it should be supported. Unidata/netcdf4-python#442

jhamman commented 9 years ago

@rabernat -

Yes - this is all coming from the netCDF4.netcdftime module.

The work around with xray is to use ds = xray.open_dataset(filename, decode_times=False) then to fix up the time variable "manually". You can use xray.decode_cf() or simply assign a new pandas time index to your time variable.

As an aside, I also work with CESM output and this is a common problem with its netCDF output.

ocefpaf commented 9 years ago

@jhamman consider using cf_units in xray :wink:

(See https://github.com/Unidata/netcdf4-python/issues/442#issuecomment-129059576)

rabernat commented 9 years ago

The PR above fixes this issue. However, since my model years are in the range 100-200, I am still getting the warning

RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range

and eventually when I try to access the time data, an error with a very long stack trace ending with

pandas/tslib.pyx in pandas.tslib.Timestamp.__new__ (pandas/tslib.c:7638)()
pandas/tslib.pyx in pandas.tslib.convert_to_tsobject (pandas/tslib.c:21232)()
pandas/tslib.pyx in pandas.tslib._check_dts_bounds (pandas/tslib.c:23332)()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 100-02-01 00:00:00

I see there is a check in conventions.py that the year has to lie between 1678 and 2226. What is the reason for this?

jhamman commented 9 years ago

We try to cast all the time variables to a pandas time index. This gives xray the ability to use many of the fast and fancy timeseries tools that pandas has. One consequence of that is that non-standard calendars, such as the "noleap" calendar must have dates inside the valid range of the standard calendars (1678 and 2226).

Does that make since? Ideally, numpy and pandas would support custom calendars but they don't so, at this point, we're bound to there limits.

rabernat commented 9 years ago

@jhamman Thanks for the clear explanation! One of the main uses for non-standard calendars would be climate model "control runs", which don't occur any any specific point in historical time but still have seasonal cycles, well defined months, etc. It would be nice to have "group by" functionality for these datasets. But I do see how this is impossible with the current numpy datetime64 datatype. Perhaps the long term fix is to implement non-standard calendars within numpy itself.

jhamman commented 9 years ago

Perhaps the long term fix is to implement non-standard calendars within numpy itself.

I agree, although that sounds like quite an undertaking. Maybe raise an issue over at numpy and ask if they would be interested in a multi-calendar api? If numpy could make it work, then I'm sure pandas could as well.

AJueling commented 5 years ago

In case anyone is still struggling with the CESM POP time units convention, with the new CF support of version 0.12 the problem is (almost) solved.

I have slightly different CESM POP netcdf output with time attributes {'units': 'days since 0-1-1 00:00:00', 'calendar': '365_day'} and crucially a dimension d2 (without coordinates) that trips up the decode_cf function.

import xarray as xr  # version >= 0.12

ds = xr.open_dataset('some_CESM_output_file.nc', decode_times=False)
ds = ds.drop_dims(['d2'])
ds = xr.decode_cf(ds, use_cftime=True)

Now the xarray Dataset has a cftime.DatetimeNoLeap type time coordinate.

spencerkclark commented 5 years ago

Could you provide the output of ncdump -h or ds.info() on an example file?

AJueling commented 5 years ago
Of course, here is the `ds.info()` output: xarray.Dataset { dimensions: bnds = 2 ; d2 = 2 ; nlat = 2400 ; nlon = 3600 ; time = 1 ; z_t = 42 ; z_t_150m = 12 ; z_w = 42 ; z_w_bot = 42 ; z_w_top = 42 ; variables: float64 time_bound(time, d2) ; time_bound:long_name = boundaries for time-averaging interval ; time_bound:units = days since 0000-01-01 00:00:00 ; float64 time(time) ; time:long_name = time ; time:units = days since 0-1-1 00:00:00 ; time:bounds = time_bnds ; time:calendar = 365_day ; time:standard_name = time ; time:axis = T ; float32 z_t(z_t) ; z_t:long_name = depth from surface to midpoint of layer ; z_t:units = centimeters ; z_t:positive = down ; z_t:valid_min = 500.62200927734375 ; z_t:valid_max = 587499.875 ; z_t:axis = Z ; float32 z_t_150m(z_t_150m) ; z_t_150m:long_name = depth from surface to midpoint of layer ; z_t_150m:units = centimeters ; z_t_150m:positive = down ; z_t_150m:valid_min = 500.62200927734375 ; z_t_150m:valid_max = 14895.82421875 ; float32 z_w(z_w) ; z_w:long_name = depth from surface to top of layer ; z_w:units = centimeters ; z_w:positive = down ; z_w:valid_min = 0.0 ; z_w:valid_max = 574999.875 ; float32 z_w_top(z_w_top) ; z_w_top:long_name = depth from surface to top of layer ; z_w_top:units = centimeters ; z_w_top:positive = down ; z_w_top:valid_min = 0.0 ; z_w_top:valid_max = 574999.875 ; float32 z_w_bot(z_w_bot) ; z_w_bot:long_name = depth from surface to bottom of layer ; z_w_bot:units = centimeters ; z_w_bot:positive = down ; z_w_bot:valid_min = 1001.2440185546875 ; z_w_bot:valid_max = 599999.875 ; float32 dz(z_t) ; dz:long_name = thickness of layer k ; dz:units = centimeters ; float32 dzw(z_w) ; dzw:long_name = midpoint of k to midpoint of k+1 ; dzw:units = centimeters ; float64 ULONG(nlat, nlon) ; ULONG:long_name = array of u-grid longitudes ; ULONG:units = degrees_east ; ULONG:standard_name = longitude ; ULONG:_CoordinateAxisType = Lon ; float64 ULAT(nlat, nlon) ; ULAT:long_name = array of u-grid latitudes ; ULAT:units = degrees_north ; ULAT:standard_name = latitude ; ULAT:_CoordinateAxisType = Lat ; float64 TLONG(nlat, nlon) ; TLONG:long_name = array of t-grid longitudes ; TLONG:units = degrees_east ; TLONG:standard_name = longitude ; TLONG:_CoordinateAxisType = Lon ; float64 TLAT(nlat, nlon) ; TLAT:long_name = array of t-grid latitudes ; TLAT:units = degrees_north ; TLAT:standard_name = latitude ; TLAT:_CoordinateAxisType = Lat ; float64 KMT(nlat, nlon) ; KMT:long_name = k Index of Deepest Grid Cell on T Grid ; float64 KMU(nlat, nlon) ; KMU:long_name = k Index of Deepest Grid Cell on U Grid ; float64 REGION_MASK(nlat, nlon) ; REGION_MASK:long_name = basin index number (signed integers) ; float64 UAREA(nlat, nlon) ; UAREA:long_name = area of U cells ; UAREA:units = centimeter^2 ; float64 TAREA(nlat, nlon) ; TAREA:long_name = area of T cells ; TAREA:units = centimeter^2 ; float64 HU(nlat, nlon) ; HU:long_name = ocean depth at U points ; HU:units = centimeter ; float64 HT(nlat, nlon) ; HT:long_name = ocean depth at T points ; HT:units = centimeter ; float64 DXU(nlat, nlon) ; DXU:long_name = x-spacing centered at U points ; DXU:units = centimeters ; float64 DYU(nlat, nlon) ; DYU:long_name = y-spacing centered at U points ; DYU:units = centimeters ; float64 DXT(nlat, nlon) ; DXT:long_name = x-spacing centered at T points ; DXT:units = centimeters ; float64 DYT(nlat, nlon) ; DYT:long_name = y-spacing centered at T points ; DYT:units = centimeters ; float64 HTN(nlat, nlon) ; HTN:long_name = cell widths on North sides of T cell ; HTN:units = centimeters ; float64 HTE(nlat, nlon) ; HTE:long_name = cell widths on East sides of T cell ; HTE:units = centimeters ; float64 HUS(nlat, nlon) ; HUS:long_name = cell widths on South sides of U cell ; HUS:units = centimeters ; float64 HUW(nlat, nlon) ; HUW:long_name = cell widths on West sides of U cell ; HUW:units = centimeters ; float64 ANGLE(nlat, nlon) ; ANGLE:long_name = angle grid makes with latitude line ; ANGLE:units = radians ; float64 ANGLET(nlat, nlon) ; ANGLET:long_name = angle grid makes with latitude line on T grid ; ANGLET:units = radians ; float64 days_in_norm_year() ; days_in_norm_year:long_name = Calendar Length ; days_in_norm_year:units = days ; float64 grav() ; grav:long_name = Acceleration Due to Gravity ; grav:units = centimeter/s^2 ; float64 omega() ; omega:long_name = Earths Angular Velocity ; omega:units = 1/second ; float64 radius() ; radius:long_name = Earths Radius ; radius:units = centimeters ; float64 cp_sw() ; cp_sw:long_name = Specific Heat of Sea Water ; cp_sw:units = erg/g/K ; float64 sound() ; sound:long_name = Speed of Sound ; sound:units = centimeter/s ; float64 vonkar() ; vonkar:long_name = von Karman Constant ; float64 cp_air() ; cp_air:long_name = Heat Capacity of Air ; cp_air:units = joule/kg/degK ; float64 rho_air() ; rho_air:long_name = Ambient Air Density ; rho_air:units = kg/m^3 ; float64 rho_sw() ; rho_sw:long_name = Density of Sea Water ; rho_sw:units = gram/centimeter^3 ; float64 rho_fw() ; rho_fw:long_name = Density of Fresh Water ; rho_fw:units = gram/centimeter^3 ; float64 stefan_boltzmann() ; stefan_boltzmann:long_name = Stefan-Boltzmann Constant ; stefan_boltzmann:units = watt/m^2/degK^4 ; float64 latent_heat_vapor() ; latent_heat_vapor:long_name = Latent Heat of Vaporization ; latent_heat_vapor:units = J/kg ; float64 latent_heat_fusion() ; latent_heat_fusion:long_name = Latent Heat of Fusion ; latent_heat_fusion:units = erg/g ; float64 ocn_ref_salinity() ; ocn_ref_salinity:long_name = Ocean Reference Salinity ; ocn_ref_salinity:units = g/kg ; float64 sea_ice_salinity() ; sea_ice_salinity:long_name = Salinity of Sea Ice ; sea_ice_salinity:units = g/kg ; float64 T0_Kelvin() ; T0_Kelvin:long_name = Zero Point for Celsius ; T0_Kelvin:units = degK ; float64 salt_to_ppt() ; salt_to_ppt:long_name = Convert Salt in gram/gram to g/kg ; float64 ppt_to_salt() ; ppt_to_salt:long_name = Convert Salt in g/kg to gram/gram ; float64 mass_to_Sv() ; mass_to_Sv:long_name = Convert Mass Flux to Sverdrups ; float64 heat_to_PW() ; heat_to_PW:long_name = Convert Heat Flux to Petawatts ; float64 salt_to_Svppt() ; salt_to_Svppt:long_name = Convert Salt Flux to Sverdrups*g/kg ; float64 salt_to_mmday() ; salt_to_mmday:long_name = Convert Salt to Water (millimeters/day) ; float64 momentum_factor() ; momentum_factor:long_name = Convert Windstress to Velocity Flux ; float64 hflux_factor() ; hflux_factor:long_name = Convert Heat and Solar Flux to Temperature Flux ; float64 fwflux_factor() ; fwflux_factor:long_name = Convert Net Fresh Water Flux to Salt Flux (in model units) ; float64 salinity_factor() ; float64 sflux_factor() ; sflux_factor:long_name = Convert Salt Flux to Salt Flux (in model units) ; float64 nsurface_t() ; nsurface_t:long_name = Number of Ocean T Points at Surface ; float64 nsurface_u() ; nsurface_u:long_name = Number of Ocean U Points at Surface ; float32 KE(time, z_t, nlat, nlon) ; KE:long_name = Horizontal Kinetic Energy ; KE:units = centimeter^2/s^2 ; KE:grid_loc = 3221 ; KE:cell_methods = time: mean ; float32 TEMP(time, z_t, nlat, nlon) ; TEMP:long_name = Potential Temperature ; TEMP:units = degC ; TEMP:grid_loc = 3111 ; TEMP:cell_methods = time: mean ; float32 SALT(time, z_t, nlat, nlon) ; SALT:long_name = Salinity ; SALT:units = gram/kilogram ; SALT:grid_loc = 3111 ; SALT:cell_methods = time: mean ; float32 SSH2(time, nlat, nlon) ; SSH2:long_name = SSH**2 ; SSH2:units = cm^2 ; SSH2:grid_loc = 2110 ; SSH2:cell_methods = time: mean ; float32 SHF(time, nlat, nlon) ; SHF:long_name = Total Surface Heat Flux, Including SW ; SHF:units = watt/m^2 ; SHF:grid_loc = 2110 ; SHF:cell_methods = time: mean ; float32 SFWF(time, nlat, nlon) ; SFWF:long_name = Virtual Salt Flux in FW Flux formulation ; SFWF:units = kg/m^2/s ; SFWF:grid_loc = 2110 ; SFWF:cell_methods = time: mean ; float32 EVAP_F(time, nlat, nlon) ; EVAP_F:long_name = Evaporation Flux from Coupler ; EVAP_F:units = kg/m^2/s ; EVAP_F:grid_loc = 2110 ; EVAP_F:cell_methods = time: mean ; float32 PREC_F(time, nlat, nlon) ; PREC_F:long_name = Precipitation Flux from Cpl (rain+snow) ; PREC_F:units = kg/m^2/s ; PREC_F:grid_loc = 2110 ; PREC_F:cell_methods = time: mean ; float32 SNOW_F(time, nlat, nlon) ; SNOW_F:long_name = Snow Flux from Coupler ; SNOW_F:units = kg/m^2/s ; SNOW_F:grid_loc = 2110 ; SNOW_F:cell_methods = time: mean ; float32 MELT_F(time, nlat, nlon) ; MELT_F:long_name = Melt Flux from Coupler ; MELT_F:units = kg/m^2/s ; MELT_F:grid_loc = 2110 ; MELT_F:cell_methods = time: mean ; float32 ROFF_F(time, nlat, nlon) ; ROFF_F:long_name = Runoff Flux from Coupler ; ROFF_F:units = kg/m^2/s ; ROFF_F:grid_loc = 2110 ; ROFF_F:cell_methods = time: mean ; float32 SALT_F(time, nlat, nlon) ; SALT_F:long_name = Salt Flux from Coupler (kg of salt/m^2/s) ; SALT_F:units = kg/m^2/s ; SALT_F:grid_loc = 2110 ; SALT_F:cell_methods = time: mean ; float32 SENH_F(time, nlat, nlon) ; SENH_F:long_name = Sensible Heat Flux from Coupler ; SENH_F:units = watt/m^2 ; SENH_F:grid_loc = 2110 ; SENH_F:cell_methods = time: mean ; float32 LWUP_F(time, nlat, nlon) ; LWUP_F:long_name = Longwave Heat Flux (up) from Coupler ; LWUP_F:units = watt/m^2 ; LWUP_F:grid_loc = 2110 ; LWUP_F:cell_methods = time: mean ; float32 LWDN_F(time, nlat, nlon) ; LWDN_F:long_name = Longwave Heat Flux (dn) from Coupler ; LWDN_F:units = watt/m^2 ; LWDN_F:grid_loc = 2110 ; LWDN_F:cell_methods = time: mean ; float32 MELTH_F(time, nlat, nlon) ; MELTH_F:long_name = Melt Heat Flux from Coupler ; MELTH_F:units = watt/m^2 ; MELTH_F:grid_loc = 2110 ; MELTH_F:cell_methods = time: mean ; float32 IAGE(time, z_t, nlat, nlon) ; IAGE:long_name = Ideal Age ; IAGE:units = years ; IAGE:grid_loc = 3111 ; IAGE:cell_methods = time: mean ; float32 WVEL(time, z_w_top, nlat, nlon) ; WVEL:long_name = Vertical Velocity ; WVEL:units = centimeter/s ; WVEL:grid_loc = 3112 ; WVEL:cell_methods = time: mean ; float32 UET(time, z_t, nlat, nlon) ; UET:long_name = Flux of Heat in grid-x direction ; UET:units = degC/s ; UET:grid_loc = 3211 ; UET:cell_methods = time: mean ; float32 VNT(time, z_t, nlat, nlon) ; VNT:long_name = Flux of Heat in grid-y direction ; VNT:units = degC/s ; VNT:grid_loc = 3121 ; VNT:cell_methods = time: mean ; float32 UES(time, z_t, nlat, nlon) ; UES:long_name = Salt Flux in grid-x direction ; UES:units = gram/kilogram/s ; UES:grid_loc = 3211 ; UES:cell_methods = time: mean ; float32 VNS(time, z_t, nlat, nlon) ; VNS:long_name = Salt Flux in grid-y direction ; VNS:units = gram/kilogram/s ; VNS:grid_loc = 3121 ; VNS:cell_methods = time: mean ; float32 PD(time, z_t, nlat, nlon) ; PD:long_name = Potential Density Ref to Surface ; PD:units = gram/centimeter^3 ; PD:grid_loc = 3111 ; PD:cell_methods = time: mean ; float32 HMXL(time, nlat, nlon) ; HMXL:long_name = Mixed-Layer Depth ; HMXL:units = centimeter ; HMXL:grid_loc = 2110 ; HMXL:cell_methods = time: mean ; float32 XMXL(time, nlat, nlon) ; XMXL:long_name = Maximum Mixed-Layer Depth ; XMXL:units = centimeter ; XMXL:grid_loc = 2110 ; XMXL:cell_methods = time: maximum ; float32 TMXL(time, nlat, nlon) ; TMXL:long_name = Minimum Mixed-Layer Depth ; TMXL:units = centimeter ; TMXL:grid_loc = 2110 ; TMXL:cell_methods = time: minimum ; float32 HBLT(time, nlat, nlon) ; HBLT:long_name = Boundary-Layer Depth ; HBLT:units = centimeter ; HBLT:grid_loc = 2110 ; HBLT:cell_methods = time: mean ; float32 XBLT(time, nlat, nlon) ; XBLT:long_name = Maximum Boundary-Layer Depth ; XBLT:units = centimeter ; XBLT:grid_loc = 2110 ; XBLT:cell_methods = time: maximum ; float32 TBLT(time, nlat, nlon) ; TBLT:long_name = Minimum Boundary-Layer Depth ; TBLT:units = centimeter ; TBLT:grid_loc = 2110 ; TBLT:cell_methods = time: minimum ; float64 SSH(time, nlat, nlon) ; SSH:long_name = Sea Surface Height ; SSH:units = centimeter ; SSH:grid_loc = 2110 ; SSH:cell_methods = time: mean ; float64 time_bnds(time, bnds) ; float64 TAUX(time, nlat, nlon) ; TAUX:long_name = Windstress in grid-x direction ; TAUX:units = dyne/centimeter^2 ; TAUX:grid_loc = 2220 ; TAUX:cell_methods = time: mean ; float64 TAUY(time, nlat, nlon) ; TAUY:long_name = Windstress in grid-y direction ; TAUY:units = dyne/centimeter^2 ; TAUY:grid_loc = 2220 ; TAUY:cell_methods = time: mean ; float64 UVEL(time, z_t, nlat, nlon) ; UVEL:long_name = Velocity in grid-x direction ; UVEL:units = centimeter/s ; UVEL:grid_loc = 3221 ; UVEL:cell_methods = time: mean ; float64 VVEL(time, z_t, nlat, nlon) ; VVEL:long_name = Velocity in grid-y direction ; VVEL:units = centimeter/s ; VVEL:grid_loc = 3221 ; VVEL:cell_methods = time: mean ; // global attributes: :title = spinup_pd_maxcores_f05_t12 ; :history = Thu Sep 14 23:06:30 2017: ncks -A /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc Thu Sep 14 23:01:57 2017: ncrename -d x_2,nlon -d y_2,nlat /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc Thu Sep 14 23:00:39 2017: ncks -v TAUX,TAUY,UVEL,VVEL /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc Thu Sep 14 21:45:58 2017: cdo -b F64 -timmean /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/daily/spinup_pd_maxcores_f05_t12.pop.h.nday1.0200-01-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc none ; :Conventions = CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf/CF-current.htm ; :contents = Diagnostic and Prognostic Variables ; :source = CCSM POP2, the CCSM Ocean Component ; :revision = $Id: tavg.F90 34115 2012-01-25 22:35:19Z njn01 $ ; :calendar = All years have exactly 365 days. ; :start_time = This dataset was created on 2017-04-15 at 12:52:48.4 ; :cell_methods = cell_methods = time: mean ==> the variable values are averaged over the time interval between the previous time coordinate and the current one. cell_methods absent ==> the variable values are at the time given by the current time coordinate. ; :nsteps_total = 25052952 ; :tavg_sum = 86399.99999999974 ; :CDI = Climate Data Interface version 1.7.0 (http://mpimet.mpg.de/cdi) ; :CDO = Climate Data Operators version 1.7.0 (http://mpimet.mpg.de/cdo) ; :NCO = "4.6.0" ; :history_of_appended_files = Thu Sep 14 23:06:30 2017: Appended file /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc had following "history" attribute: Thu Sep 14 23:01:57 2017: ncrename -d x_2,nlon -d y_2,nlat /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc Thu Sep 14 23:00:39 2017: ncks -v TAUX,TAUY,UVEL,VVEL /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Ugrid_vars_0200-01.nc Thu Sep 14 21:45:58 2017: cdo -b F64 -timmean /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/daily/spinup_pd_maxcores_f05_t12.pop.h.nday1.0200-01-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc none Thu Sep 14 23:03:20 2017: Appended file /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Tgrid_vars_0200-01.nc had following "history" attribute: Thu Sep 14 23:01:57 2017: ncrename -d x,nlon -d y,nlat /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Tgrid_vars_0200-01.nc Thu Sep 14 23:00:38 2017: ncks -v SSH /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/needed_Tgrid_vars_0200-01.nc Thu Sep 14 21:45:58 2017: cdo -b F64 -timmean /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/daily/spinup_pd_maxcores_f05_t12.pop.h.nday1.0200-01-01.nc /projects/0/prace_imau/prace_2013081679/cesm1_0_4/spinup_pd_maxcores_f05_t12/OUTPUT/ocn/hist/monthly/NEW/daily_to_monthly_fields_spinup_pd_maxcores_f05_t12.pop.h.0200-01.nc none ; }
spencerkclark commented 5 years ago

Thanks -- in looking at the metadata it seems there is nothing unusual about the 'd2' dimension (in normal circumstances we should be able to decode N-D variables to dates, regardless of their type).

My feeling is that the issue here remains the fact that cftime dates do not support year zero (see the upstream issue @rabernat mentioned earlier: Unidata/netcdf4-python#442). That said, it's surprising that dropping the 'time_bounds' variable seems to be a workaround for this issue, because the 'time' variable (which remains in the dataset) still has units with a reference date of year zero.

If you don't mind, could you provide me with two more things?

AJueling commented 5 years ago

Opening the file as ds = xr.open_dataset('some_CESM_output_file.nc', decode_times=False) the time coordinate ds.time is at first simply an array of floats:

<xarray.DataArray 'time' (time: 1)>
array([73020.])
Coordinates:
  * time     (time) float64 7.302e+04
Attributes:
    long_name:      time
    units:          days since 0-1-1 00:00:00
    bounds:         time_bnds
    calendar:       365_day
    standard_name:  time
    axis:           T

and after decoding xr.decode_cf(ds, use_cftime=True).time returns

<xarray.DataArray 'time' (time: 1)>
array([cftime.DatetimeNoLeap(200, 1, 21, 0, 0, 0, 0, 3, 21)], dtype=object)
Coordinates:
  * time     (time) object 0200-01-21 00:00:00
Attributes:
    long_name:      time
    bounds:         time_bnds
    standard_name:  time
    axis:           T

The traceback of opening the file without decode_times=False complains about year 0 being outside the range of the Gregorian or Julian calendars:

traceback --------------------------------------------------------------------------- OutOfBoundsDatetime Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar) 128 try: --> 129 ref_date = pd.Timestamp(ref_date) 130 except ValueError: pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject() pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject() pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds() OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 0-01-01 00:00:00 During handling of the above exception, another exception occurred: OutOfBoundsDatetime Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime) 175 dates = _decode_datetime_with_pandas(flat_num_dates, units, --> 176 calendar) 177 except (OutOfBoundsDatetime, OverflowError): ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar) 132 # strings, in which case we fall back to using cftime --> 133 raise OutOfBoundsDatetime 134 OutOfBoundsDatetime: During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime) 93 result = decode_cf_datetime(example_value, units, calendar, ---> 94 use_cftime) 95 except Exception: ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime) 178 dates = _decode_datetime_with_cftime( --> 179 flat_num_dates.astype(np.float), units, calendar) 180 ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_datetime_with_cftime(num_dates, units, calendar) 112 return np.asarray(cftime.num2date(num_dates, units, calendar, --> 113 only_use_cftime_datetimes=True)) 114 else: cftime/_cftime.pyx in cftime._cftime.num2date() ValueError: zero not allowed as a reference year, does not exist in Julian or Gregorian calendars During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) in ----> 1 ds = xr.open_dataset(CESM_filename(domain='ocn', run='ctrl', y=200, m=1))#, use_cftime=True) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime) 392 393 with close_on_error(store): --> 394 ds = maybe_decode_store(store) 395 396 # Ensure source filename always stored in dataset object (GH issue #2550) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock) 322 store, mask_and_scale=mask_and_scale, decode_times=decode_times, 323 concat_characters=concat_characters, decode_coords=decode_coords, --> 324 drop_variables=drop_variables, use_cftime=use_cftime) 325 326 _protect_dataset_variables_inplace(ds, cache) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) 477 vars, attrs, coord_names = decode_cf_variables( 478 vars, attrs, concat_characters, mask_and_scale, decode_times, --> 479 decode_coords, drop_variables=drop_variables, use_cftime=use_cftime) 480 ds = Dataset(vars, attrs=attrs) 481 ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars)) ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime) 399 k, v, concat_characters=concat_characters, 400 mask_and_scale=mask_and_scale, decode_times=decode_times, --> 401 stack_char_dim=stack_char_dim, use_cftime=use_cftime) 402 if decode_coords: 403 var_attrs = new_vars[k].attrs ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/conventions.py in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime) 304 for coder in [times.CFTimedeltaCoder(), 305 times.CFDatetimeCoder(use_cftime=use_cftime)]: --> 306 var = coder.decode(var, name=name) 307 308 dimensions, data, attributes, encoding = ( ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in decode(self, variable, name) 417 calendar = pop_to(attrs, encoding, 'calendar') 418 dtype = _decode_cf_datetime_dtype(data, units, calendar, --> 419 self.use_cftime) 420 transform = partial( 421 decode_cf_datetime, units=units, calendar=calendar, ~/.conda/envs/CESM/lib/python3.6/site-packages/xarray/coding/times.py in _decode_cf_datetime_dtype(data, units, calendar, use_cftime) 99 'opening your dataset with decode_times=False.' 100 % (units, calendar_msg)) --> 101 raise ValueError(msg) 102 else: 103 dtype = getattr(result, 'dtype', np.dtype('object')) ValueError: unable to decode time units 'days since 0000-01-01 00:00:00' with the default calendar. Try opening your dataset with decode_times=False.
spencerkclark commented 5 years ago

Great that's helpful, thanks. I see what's happening now. There's a lot of tricky things going on, so bear with me.

Let's examine the output from ds.info() related to the time bounds and time variables:

float64 time_bound(time, d2) ;
time_bound:long_name = boundaries for time-averaging interval ;
time_bound:units = days since 0000-01-01 00:00:00 ;
float64 time(time) ;
time:long_name = time ;
time:units = days since 0-1-1 00:00:00 ;
time:bounds = time_bnds ;
time:calendar = 365_day ;
time:standard_name = time ;
time:axis = T ;

There are a few important things to note:

  1. In both the 'time_bound' and 'time' variables, the units attribute contains a reference date with year zero.
  2. 'time' has a calendar attribute of '365_day', while a calendar attribute is not specified for the 'time_bound'.
  3. 'time' has a 'bounds' attribute that points to a variable named 'time_bnds' instead of 'time_bound'.

For non-real-world calendars (e.g. 365_day), reference dates in cftime should allow year zero. This was fixed upstream in https://github.com/Unidata/netcdf4-python/pull/470. That being said, because of (2), the calendar for 'time_bound' is assumed to be a standard calendar; therefore you get this ValueError when decoding the times:

ValueError: zero not allowed as a reference year, does not exist in Julian or Gregorian calendars

Ultimately though, with https://github.com/pydata/xarray/pull/2571, we try to propagate the time-related attributes from the time coordinate to the associated bounds coordinate (so in normal circumstances we would use a 365_day calendar in this case as well). But, because of (3), this is not possible due to the fact that the 'bounds' attribute on the 'time' variable points to a variable name that does not exist.

In theory, another possible way to work around this would be to open the dataset with decode_times=False, add the appropriate calendar attribute to 'time_bound', and then decode the times:

ds = xr.open_dataset('some_CESM_output_file.nc', decode_times=False)
ds.time_bound.attrs['calendar'] = ds.time.attrs['calendar']
ds = xr.decode_cf(ds, use_cftime=True)

Now, this may still not work depending on the values in the the 'time_bound' variable (i.e. if any are less than 365.), because cftime currently does not support year zero in date objects (even for non-real-world calendars). I think one could make the argument that this is inconsistent with allowing reference dates with year zero for those date types, so it would probably be worth opening an issue there to try and get that fixed upstream.

In conclusion, I'm afraid there is nothing we can do in xarray to automatically fix this situation. Issue (3) in the netCDF file is particularly unfortunate. If it weren't for that, I think all of these issues would be possible to work around, e.g. with https://github.com/pydata/xarray/pull/2571 here, or with fixes upstream.

spencerkclark commented 5 years ago

Now, this may still not work depending on the values in the the 'time_bound' variable (i.e. if any are less than 365.), because cftime currently does not support year zero in date objects (even for non-real-world calendars). I think one could make the argument that this is inconsistent with allowing reference dates with year zero for those date types, so it would probably be worth opening an issue there to try and get that fixed upstream.

I opened an issue in cftime regarding this: https://github.com/Unidata/cftime/issues/114.

rabernat commented 5 years ago

It's important to be clear that the issues 2 and 3 that @spencerkclark pointed out are objectively errors in the metadata. We have worked very hard over many years to enable xarray to correctly parse CF-compliant dates with non-standard calendars. But xarray cannot and should not be expected to magically fix metadata that is inconsistent or incomplete.

You really need to bring these issues to the attention of whoever generated some_CESM_output_file.nc.

klindsay28 commented 5 years ago

@rabernat , it is not clear to me that issue 2 is an objective error in the metadata.

The CF conventions section on the bounds attribute states:

Since a boundary variable is considered to be part of a coordinate variable’s metadata, it is not necessary to provide it with attributes such as long_name and units.

Boundary variable attributes which determine the coordinate type (units, standard_name, axis and positive) or those which affect the interpretation of the array values (units, calendar, leap_month, leap_year and month_lengths) must always agree exactly with the same attributes of its associated coordinate, scalar coordinate or auxiliary coordinate variable. To avoid duplication, however, it is recommended that these are not provided to a boundary variable.

I conclude from this that software parsing CF metadata should have the variable identified by the bounds attribute inherit the attributes mentioned above from the variable with the bounds attribute. @spencerkclark describes this as a work around. One could argue that based on the CF conventions text, xarray would be justified in dong that automatically.

However, this is confounded by issue 3, that time.attrs.bounds /= 'time_bound', which I agree is an error in the metadata. As a CESM-POP developer, I'm surprised to see that. Raw model output from CESM-POP has time.attrs.bounds = 'time_bound'. So it seems like something in a post-processing workflow has the net effect of changing time.attrs.bounds, but is preserving the name of the variable bounds. That is problematic.

If CESM-POP were to adhere more closely to the CF recommendation in this section, I think it would drop time_bound.attrs.units, not add time_bound.attrs.calendar. But I don't think that is what you are suggesting.

rabernat commented 5 years ago

@klindsay28 -- thanks for the clarification. You're clearly right about 2, and I was misinformed. The problem is that 3 makes it impossible follow the CF convention rules to overcome 2 (which xarray would try to do).

klindsay28 commented 5 years ago

@AJueling , do you know the provenance of the file with time.attrs.bounds /= 'time_bound'? If that file is being produced by an NCAR or CESM supplied workflow, then I am willing to see if the workflow can be corrected to keep time.attrs.bounds = 'time_bound'. With this mismatch, it seems hopeless for xarray to automatically figure out how to handle this file as it was intended to be handled.

AJueling commented 5 years ago

Thank you all for the clarification! I will get in touch with the person who ran the model and get back to you as soon as possible.

LeparaLaMapara commented 3 years ago

I'm also getting the same error:ValueError: unable to decode time units 'months since 1955-01-01 00:00:00' with 'the default calendar'. Try opening your dataset with decode_times=False or installing cftime if it is not installed.