Closed ShihengDuan closed 1 year ago
When I look at the what I think is the original version of this dataset, I get the following:
import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'
ds = xc.open_mfdataset(fn)
MergeError: conflicting values for variable 'lat_bnds' on objects to be combined. You can skip this check by specifying compat='override'.
See https://github.com/xCDAT/xcdat/issues/162 for more on this.
I think this error arises when the lat_bnds
values vary across files, which I suspect may be what is happening here. It looks like the data has been re-saved out in tas/tas_MIROC5_piControl.nc
. What does an ncdump -h
look like @ShihengDuan?
Here is the output from ncdump -h:
(duan) /home/duan5/ISIMIP/MIROC5/piControl/tas$ ncdump -h tas_MIROC5_piControl.nc
netcdf tas_MIROC5_piControl {
dimensions:
lat = 132 ;
lon = 256 ;
time = 10440 ;
bnds = 2 ;
variables:
double lat(lat) ;
lat:_FillValue = NaN ;
lat:bounds = "lat_bnds" ;
lat:units = "degrees_north" ;
lat:axis = "Y" ;
lat:long_name = "latitude" ;
lat:standard_name = "latitude" ;
double lon(lon) ;
lon:_FillValue = NaN ;
lon:bounds = "lon_bnds" ;
lon:units = "degrees_east" ;
lon:axis = "X" ;
lon:long_name = "longitude" ;
lon:standard_name = "longitude" ;
double time(time) ;
time:_FillValue = NaN ;
time:bounds = "time_bnds" ;
time:axis = "T" ;
time:long_name = "time" ;
time:standard_name = "time" ;
time:units = "days since 2000-01-01" ;
time:calendar = "noleap" ;
double time_bnds(lat, time, bnds) ;
time_bnds:_FillValue = NaN ;
time_bnds:coordinates = "height" ;
double lat_bnds(time, lat, bnds) ;
lat_bnds:_FillValue = NaN ;
lat_bnds:coordinates = "height" ;
double lon_bnds(lat, time, lon, bnds) ;
lon_bnds:_FillValue = NaN ;
lon_bnds:coordinates = "height" ;
double height ;
height:_FillValue = NaN ;
height:units = "m" ;
height:axis = "Z" ;
height:positive = "up" ;
height:long_name = "height" ;
height:standard_name = "height" ;
float tas(time, lat, lon) ;
tas:_FillValue = 1.e+20f ;
tas:standard_name = "air_temperature" ;
tas:long_name = "Near-Surface Air Temperature" ;
tas:units = "K" ;
tas:original_name = "T2" ;
tas:cell_methods = "time: mean" ;
tas:cell_measures = "area: areacella" ;
tas:history = "2011-09-15T06:51:00Z altered by CMOR: Treated scalar dimension: \'height\'. 2011-09-15T06:51:00Z altered by CMOR: replaced missing value flag (-999) with standard missing value (1e+20). 2011-09-15T06:51:00Z altered by CMOR: Inverted axis: lat." ;
tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_MIROC5_piControl_r0i0p0.nc areacella: areacella_fx_MIROC5_piControl_r0i0p0.nc" ;
tas:coordinates = "height" ;
tas:missing_value = 1.e+20f ;
// global attributes:
:institution = "AORI (Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, Japan), NIES (National Institute for Environmental Studies, Ibaraki, Japan), JAMSTEC (Japan Agency for Marine-Earth Science and Technology, Kanagawa, Japan)" ;
:institute_id = "MIROC" ;
:experiment_id = "piControl" ;
:source = "MIROC5 2010 atmosphere: MIROC-AGCM6 (T85L40); ocean: COCO (COCO4.5, 256x224 L50); sea ice: COCO (COCO4.5); land: MATSIRO (MATSIRO, L6); aerosols: SPRINTARS (SPRINTARS 5.00, T85L40)" ;
:model_id = "MIROC5" ;
:forcing = "N/A" ;
:parent_experiment_id = "N/A" ;
:parent_experiment_rip = "N/A" ;
:branch_time = 0. ;
:contact = "Masahiro Watanabe (hiro@aori.u-tokyo.ac.jp), Seita Emori (emori@nies.go.jp), Masayoshi Ishii (ism@jamstec.go.jp), Masahide Kimoto (kimoto@aori.u-tokyo.ac.jp)" ;
:references = "Watanabe et al., 2010: Improved climate simulation by MIROC5: Mean states, variability, and climate sensitivity. J. Climate, 23, 6312-6335" ;
:initialization_method = 1 ;
:physics_version = 1 ;
:tracking_id = "d5a62be5-f4e9-4e63-b63f-5fea30c77a52" ;
:product = "output" ;
:experiment = "pre-industrial control" ;
:frequency = "mon" ;
:creation_date = "2011-09-15T06:51:00Z" ;
:history = "2011-09-15T06:51:00Z CMOR rewrote data to comply with CF standards and CMIP5 requirements." ;
:Conventions = "CF-1.4" ;
:project_id = "CMIP5" ;
:table_id = "Table Amon (26 July 2011) 976b7fd1d9e1be31dddd28f5dc79b7a1" ;
:title = "MIROC5 model output prepared for CMIP5 pre-industrial control" ;
:parent_experiment = "N/A" ;
:modeling_realm = "atmos" ;
:realization = 1 ;
:cmor_version = "2.7.1" ;
I concatenated all the piControl simulation with the following code:
files = glob.glob('data/tas_Amon_*.nc')
# print(sorted(files))
tas = xa.open_mfdataset(sorted(files))
print(tas)
print(np.unique(tas.time, return_counts=True))
print('Done')
print(len(np.unique(tas.time)))
print(len(tas.time))
tas.to_netcdf('tas_MIROC5_piControl.nc')
I forgot why I didn't do tas=tas.compute()
before saving to nc files, but when I add this code, it still shows time dimension. See below:
(duan) /home/duan5/ISIMIP/MIROC5/piControl/tas$ python
Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xcdat as xc
>>> data = xc.open_dataset('tas_MIROC5_piControl_compute.nc')
>>> data['lat_bnds']
<xarray.DataArray 'lat_bnds' (time: 10440, lat: 132, bnds: 2)>
[2756160 values with dtype=float64]
Coordinates:
* lat (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
* time (time) object 2000-01-16 12:00:00 ... 2869-12-16 12:00:00
height float64 2.0
Dimensions without coordinates: bnds
>>> data = xc.open_dataset('tas_MIROC5_piControl.nc')
>>> data['lat_bnds']
<xarray.DataArray 'lat_bnds' (time: 10440, lat: 132, bnds: 2)>
[2756160 values with dtype=float64]
Coordinates:
* lat (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
* time (time) object 2000-01-16 12:00:00 ... 2869-12-16 12:00:00
height float64 2.0
Dimensions without coordinates: bnds
>>>
tas_MIROC5_piControl.nc
is the original file and tas_MIROC5_piControl_compute.nc
is what I get when I do tas.compute()
before saving to nc file.
@ShihengDuan – I realize that when I go back to the original issue on this ticket, if I use xarray instead of xcdat when opening the dataset (or the original data), I think I have the same issue (lat_bnds
has time
as a dim). I think this may either be an issue with the underlying data that needs to be resolved or an issue with xarray (on which xcdat is based).
Do you agree?
Oh yes, I didn't realize that. I guess it is related to how the original data file is formatted?
I think this may either be an issue with the underlying data that needs to be resolved or an issue with xarray (on which xcdat is based).
Based on the error below, I believe this is an issue with the underlying data rather than with xarray.
import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'
ds = xc.open_mfdataset(fn)
MergeError: conflicting values for variable 'lat_bnds' on objects to be combined. You can skip this check by specifying compat='override'.
Here is info from the FAQs that covers this specific issue and a workaround with xarray: https://xcdat.readthedocs.io/en/latest/faqs.html#how-do-i-open-a-multi-file-dataset-with-values-that-conflict
In xarray, the default setting for checking compatibility across a multi-file dataset is
compat='no_conflicts'
. If conflicting values exists between files,xarray raises MergeError: conflicting values for variable <VARIABLE NAME> on objects to be combined. You can skip this check by specifying compat="override"
.If you still intend on working with these datasets and recognize the source of the issue (e.g., minor floating point diffs), follow the instructions below. Please understand the potential implications before proceeding!
The xcdat/xarray workaround is to configure open_dataset()
/open_mfdataset()
with these settings:
import xcdat as xc
fn = '/p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/piControl/mon/atmos/Amon/r1i1p1/tas/1/*.nc'
ds = xc.open_mfdataset(fn, compat="override", join="override", coords="minimal")
print(ds.lat_bnds)
# Notice how with `coords="minimal"`, time dimension is not being added to lat_bnds and lon_bnds
<xarray.DataArray 'lat_bnds' (lat: 128, bnds: 2)>
dask.array<open_dataset-752fd673e13445722f06a881808577f1lat_bnds, shape=(128, 2), dtype=float64, chunksize=(128, 2), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
height float64 ...
Dimensions without coordinates: bnds
compat="override"
: skip comparing and pick variable from first join="override"
: if indexes are of same size, rewrite indexes to be those of the first object with that dimension. Indexes for the same dimension must have the same size in all objects.coords="minimal"
: Only coordinates in which the dimension already appears are included.
Thanks for the information. This makes sense to me.
What happened?
When I load data with xcdat, sometimes it generates additional dimensions for lat_bnds and lon_bnds. An example code is here:
It generates the following output:
For lon_bnds, an additional latitude dimension is generated.
The situation changes for different files. Sometimes I got
(time, lon, bnds)
for lon_bnds.The version of xCDAT is 0.4.0.
What did you expect to happen?
Keep only latitude/longitude and bnds dimensions for bnds variables.
Minimal Complete Verifiable Example
No response
Relevant log output
No response
Anything else we need to know?
No response
Environment
xCDAT is 0.4.0. xr.show_versions() yields the following