pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

to_netcdf / open_dataset is not idempotent #4512

Open MVivien opened 4 years ago

MVivien commented 4 years ago

What happened: I created a Dataset from a Dataarray with a data name equal to its dimension name and no coordinate. When saving the Dataset as netcdf and opening that netcdf as a Dataset again the opened Dataset does not have any data variable and the actual variable has become a coordinate.

What you expected to happen: I would expect theto_netcdf / open_dataset process to be idempotent and obtain a Dataset that is identical to the one I saved as netcdf.

Minimal Complete Verifiable Example:

import xarray as xr

da = xr.DataArray(
    [1, 2, 3, 4],
    dims=['lat'],
    name='lat'
)
ds = da.to_dataset()

ds.to_netcdf('bug.nc')
ds2 = xr.open_dataset('bug.nc')

print(ds)
print(ds2)

Output

<xarray.Dataset>
Dimensions:  (lat: 4)
Dimensions without coordinates: lat
Data variables:
    lat      (lat) int64 1 2 3 4

<xarray.Dataset>
Dimensions:  (lat: 4)
Coordinates:
  * lat      (lat) int64 1 2 3 4
Data variables:
    *empty*

Anything else we need to know?:

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 17:50:39) [GCC Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: None dask: 2.30.0 distributed: None matplotlib: 3.1.3 cartopy: None seaborn: None numbagg: None pint: None setuptools: 50.3.0.post20201006 pip: 20.2.3 conda: None pytest: 6.1.0 IPython: 5.8.0 sphinx: None
dcherian commented 4 years ago

This is the same bug as in https://github.com/pydata/xarray/pull/4108#discussion_r431949790

<xarray.Dataset>
Dimensions:  (lat: 4)
Dimensions without coordinates: lat
Data variables:
    lat      (lat) int64 1 2 3 4

This isn't xarray's data model IIUC. Variables with the same name as dimensions are treated as coordinate variables (or indexed dimensions).