Closed oliviermarti closed 1 year ago
Thanks for sharing this – we had not come across this case yet (looking at CMIP6 data). This file seems to have two time variables (with identical time values in each), but I could imagine that some netCDF files would have two distinct time axes.
I see that this issue results because xcdat is expecting one time axis and has logic built around checking/decoding a single time axis, but this file has time_counter
and time_centered
. Ideally, we could find all time axes and decode them iteratively. One concern is this might effect how we use cfxarray – we may generally expect a single dimension variable.
This is going to need a closer look.
Thank you for your interest. Note that we could specify the name of the right time axis at the call of open_dataset. However, if different variables in the file use different time axis, we are screwed with this method. I'm not sure that IPSL file format is a very good one. But we have Tb of them ... !!
By specifying the name when you open the dataset, you could at least work with all the variables that share that axis. Then open a new session for the variables with the other time axis.
I like @rljacob's idea (at least as a possible stopgap solution). I think you could potentially implement this by taking in an optional variable in xcdat.open_[mf]dataset
, determining the .dims
(or maybe .coords
) associated with that variable` and dropping any dim/coord not associated with the variable.
I am hitting a similar error opening CESM2 LE data (which signals this issue may affect E3SM files, too).
import xcdat
fns = '/p/user_pub/climate_work/pochedley1/cesm2le/TREFHT/*1301.020*'
ds = xcdat.open_mfdataset(fns)
AttributeError: cf_xarray can't wrap attribute 'dims' because there are multiple values for 'longitude'. There is no unique mapping from 'longitude' to a value in 'dims'.
As a temporary solution, I am opening this data with xarray. xcdat basically uses xr.open_[mv]dataset
, but adds bounds (if they aren't there) and decodes some non-cf-compliant time axes to cftime arrays (if they are non-cf compliant). So xarray open dataset functionality is a reasonable workaround for many datasets.
@pochedls is there a way you can make the Receive multiple variables for key 'T': {'time_centered', 'time_counter'}. Expected only one
error more specific, by specifying where the different time coordinates come from. This could help the user figure out what is wrong. In the present case, I imagine one of the time axes is found thanks to the axis:T
attribute, and the other thanks to the coordinates
variable attribute
Also, I think adding something in the documentation about the logic of finding the axes and how the coordinates
attribute is handled could be useful. I'm guessing you want to follow the CF Convention as much as possible (#292)
Clarifying how you handle the coordinates
variable attribute in different cases would probably be useful, but I confess I'm not even sure how this attribute works. I have quickly checked the CF web site and found references to the coordinates
attribute in
I looked into this a little further and have some ideas to address the issue in Test 1 and Test 2.
The immediate problem is when opening the dataset and getting the time axis in the _has_cf_compliant_time
routine via ds.cf["T"]
, which returns:
KeyError: "Receive multiple variables for key 'T': {'time_centered', 'time_counter'}. Expected only one. Please pass a list ['T'] instead to get all variables matching 'T'."
This occurs because the dataset has multiple time axes. There are a couple potentially simple solutions to this:
ds.cf[["T"]]
instead of ds.cf["T"]
. You could check if one of the time coordinates is a dimension by checking this intersection: list(set(ds.cf[["T"]].coords) & set(ds.cf[["T"]].dims))
. If one axis satisfies this criteria, you could then choose this axis (e.g., updating the selection here) within _has_cf_compliant_time
.
A more complete solution might look like:
open_*dataset
functions: if decode_times=True
then get all temporal coordinates via time_axis_list = ds.cf[["T"]].coords
._has_cf_compliant_time
to accept a DataArray rather than a data path). Decode non-cf_compliant axes within the loop and update the original dataset with the decoded axes and gather the cf_compliant DataArrays (and their attributes) into two dictionaries (e.g., variable_dict
, attrs_dict
).
decode_non_cf_time
to accept an axis DataArray. You would also need to write a new function to replace get_bounds
to get the bounds from a specific axis (rather than by key). xr.conventions.cf_decoder(variable_dict, attrs_dict)
. Then update the original dataset with the decoded axes.I think this would decode all time axes, which I think is what we should do (if possible). I do think temporal operations will still need to be updated to be able to select the appropriate time coordinate (e.g., by selecting the time coordinate associated with a specified data_var).
One note that might simplify the main logic above (at Step 2). Our version of _get_cftime_coords
appears to work on both cf_compliant and non-cf_compliant data so a revised version of decode_non_cf_time
(e.g., replaced with decode_time
) might simply be applied to every time axis (which would save putting the cf_compliant data into a DataArray and having xarray decode). One reason we might not do this is that xarray may have more robust functionality to decode temporal data.
In looking at Test 3, I see another issue here. The time axis needed for temporal operations is not actually assigned as a dimension. I thought this could be fixed by dropping time_counter
and just using time_centered
: this would get rid of the multiple time axes which is the issue for Test 1 and 2 and would also be the appropriate coordinate for temporal operations:
import numpy as np, xarray as xr, xcdat as xc
TS_File ='https://thredds-su.ipsl.fr/thredds/dodsC/ipsl_thredds/omamce/TestCases/XCDAT/thetao_CF_1.nc'
ds = xr.open_dataset(TS_File, decode_times=False)
ds['time_counter'] = ds['time_centered']
ds = ds.drop_vars('time_centered')
ds = ds.rename({'time_counter': 'time_centered'})
ds.to_netcdf('test.nc', 'w', 'NETCDF3_CLASSIC')
ds = xc.open_dataset('test.nc')
ds.temporal.group_average('thetao', freq='year', weighted=True)
But this yields a different error:
AttributeError: 'DataArray' object has no attribute 'time'
@tomvothecoder - do you think this is a separate issue in the temporal averaging logic?
In looking at Test 3, I see another issue here. The time axis needed for temporal operations is not actually assigned as a dimension. I thought this could be fixed by dropping
time_counter
and just usingtime_centered
: this would get rid of the multiple time axes which is the issue for Test 1 and 2 and would also be the appropriate coordinate for temporal operations:import numpy as np, xarray as xr, xcdat as xc TS_File ='https://thredds-su.ipsl.fr/thredds/dodsC/ipsl_thredds/omamce/TestCases/XCDAT/thetao_CF_1.nc[](https://thredds-su.ipsl.fr/thredds/dodsC/ipsl_thredds/omamce/TestCases/XCDAT/thetao_CF_1.nc)' ds = xr.open_dataset(TS_File, decode_times=False) ds['time_counter'] = ds['time_centered'] ds = ds.drop_vars('time_centered') ds = ds.rename({'time_counter': 'time_centered'}) ds.to_netcdf('test.nc', 'w', 'NETCDF3_CLASSIC') ds = xc.open_dataset('test.nc') ds.temporal.group_average('thetao', freq='year', weighted=True)
But this yields a different error:
AttributeError: 'DataArray' object has no attribute 'time'
@tomvothecoder - do you think this is a separate issue in the temporal averaging logic?
Yes, this issue is related to the TemporalAccessor
class methods incorrectly performing static references to "time" as the name of the time dimension.
I opened up a separate issue and the related PR that fixes this:
I just wanted to note that I hit this error message when using the regridder (use the code in this issue with one of the two datasets below).
KeyError: "Receive multiple variables for key 'longitude': {'longitude', 'lon'}. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'."
* /p/css03/esgf_publish/CMIP6/CMIP/BCC/BCC-CSM2-MR/historical/r1i1p1f1/SImon/siconc/gn/v20200218/
* /p/css03/esgf_publish/CMIP6/CMIP/BCC/BCC-ESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/
It's possible that efforts to address this issue (e.g., here) will fix the regridding issue.
Hi @oliviermarti and @jypeter,
Thank you for your patience as we worked to solve this issue. I just wanted to inform you that that this issue is now resolved in xcdat=0.4.0
(#343). You should be able to add bounds, decode times, and perform xcdat
temporal operations now.
Give it a shot and let us know how it goes!
Example code:
import numpy as np,
import xarray as xr
import xcdat as xc
print(xc.__version)
# 0.4.0
TS_File ='https://thredds-su.ipsl.fr/thredds/dodsC/ipsl_thredds/omamce/TestCases/XCDAT/thetao_CF_1.nc'
dTS = xc.open_dataset(TS_File, add_bounds=True, decode_times=True)
print(dTS)
# <xarray.Dataset>
# Dimensions: (y: 149, x: 182, olevel: 31, time_counter: 12,
# axis_nbounds: 2, bnds: 2)
# Coordinates:
# * olevel (olevel) float32 5.0 15.0 25.0 ... 4.75e+03 5.25e+03
# * time_counter (time_counter) object 1850-01-16 12:00:00 ... 1850-...
# Dimensions without coordinates: y, x, axis_nbounds, bnds
# Data variables:
# nav_lat (y, x) float32 ...
# nav_lon (y, x) float32 ...
# thetao (time_counter, olevel, y, x) float32 ...
# time_centered (time_counter) float64 ...
# time_centered_bounds (time_counter, axis_nbounds) float64 ...
# time_counter_bnds (time_counter, bnds) object 1850-01-01 18:00:00 ......
# olevel_bnds (olevel, bnds) float32 -0.000237 10.0 ... 5.5e+03
# Attributes:
# name: CM6-SW-pi-Par01_1m_18500101_18501231_grid_T
# description: Created by xios
# title: Created by xios
# Conventions: CF-1.6
# timeStamp: 2021-Oct-22 08:27:34 GMT
# uuid: bf661a39-5d4a-4999-aaae-56f09c3123c5
# LongName: IPSLCM6.1.10-LR
# NCO: netCDF Operators version 4.9.9 (Homepage...
# _NCProperties: version=2,netcdf=4.7.4,hdf5=1.10.6
# history: Thu Jul 28 16:24:47 2022: ncatted -a coo...
# DODS_EXTRA.Unlimited_Dimension: time_counter
tos_year = dTS.temporal.group_average("thetao", freq="year", weighted=True)
# <xarray.Dataset>
# Dimensions: (y: 149, x: 182, olevel: 31, bnds: 2, time_counter: 1)
# Coordinates:
# * olevel (olevel) float32 5.0 15.0 25.0 ... 4.25e+03 4.75e+03 5.25e+03
# * time_counter (time_counter) object 1850-01-01 00:00:00
# Dimensions without coordinates: y, x, bnds
# Data variables:
# nav_lat (y, x) float32 ...
# nav_lon (y, x) float32 ...
# olevel_bnds (olevel, bnds) float32 -0.000237 10.0 10.0 ... 5e+03 5.5e+03
# thetao (time_counter, olevel, y, x) float64 nan nan nan ... nan nan
# Attributes:
# name: CM6-SW-pi-Par01_1m_18500101_18501231_grid_T
# description: Created by xios
# title: Created by xios
# Conventions: CF-1.6
# timeStamp: 2021-Oct-22 08:27:34 GMT
# uuid: bf661a39-5d4a-4999-aaae-56f09c3123c5
# LongName: IPSLCM6.1.10-LR
# NCO: netCDF Operators version 4.9.9 (Homepage...
# _NCProperties: version=2,netcdf=4.7.4,hdf5=1.10.6
# history: Thu Jul 28 16:24:47 2022: ncatted -a coo...
# DODS_EXTRA.Unlimited_Dimension: time_counter
In your example file:
time_counter
is the axis dimension. time_counter
has two sets of coordinates:
time_counter
(dimension coordinate variable) -- no bounds are found, xcdat
adds them (time_counter_bnds
)time_centered
(non-dimension coordinate variable, aka "auxillary") -- has time_centered_bounds
What changed in xcdat=0.4.0
time_counter
and time_counter_bnds
Le 10 nov. 2022 à 23:46, Tom Vo @.***> a écrit :
Hi @oliviermarti https://github.com/oliviermarti and @jypeter https://github.com/jypeter,
Thank you for your patience as we worked to solve this issue. I just wanted to inform you that that this issue is now resolved in xcdat=0.4.0 (#343 https://github.com/xCDAT/xcdat/pull/343). You should be able to add bounds, decode times, and perform xcdat temporal operations now. Give it a shot and let us know how it goes!
Hi,
Sorry for this late response.
This xcdat revision is OK. I can read and manipulate my data now. Thanks !
Olivier
— Olivier Marti LSCE Bât 714 p. 1049 MERMAID Team Normal situation : +33 1 69 08 77 27 Corona lockdown : +33 6 45 36 43 74 @.***
What happened?
The IPSL Earth System Model output files has two time variables, and xcdat refuses to open them.
Developer Notes (@tomvothecoder, 8/30/22)
Dataset Info:
time_counter
time_counter
andtime_variable
mapped totime_counter
dimensionHow xcdat handles coords:
cf_xarray.cf["T"]
to get the coord var, which breaks if there are multiple coord vars foundcf_xarray.cf[["T"]]
to return a Dataset with all the coord vars mapped to an axisWhat did you expect to happen?
No response
Minimal Complete Verifiable Example
Relevant log output
Anything else we need to know?
No response
Environment
xcdat.version : '0.3.0'
xr.show_versions() :
INSTALLED VERSIONS
commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:03:09) [Clang 13.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1
xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.8.1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.1 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.6.1 distributed: 2022.6.1 matplotlib: 3.5.2 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.19.2 sparse: 0.13.0 setuptools: 63.1.0 pip: 22.1.2 conda: 4.13.0 pytest: None IPython: 8.4.0 sphinx: None