pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

combine_first of Datasets changes dtype of variable present only in one Dataset #4220

Open equaeghe opened 4 years ago

equaeghe commented 4 years ago

What happened: I was combining two Datasets using combine_first and to my surprise the dtype of one of the DataArrays in the merged Dataset was changed (from bool to float64).

What you expected to happen: No change in dtype.

Minimal Complete Verifiable Example:

>>> import xarray as xr
>>> ds = xr.Dataset(coords={'abc': list('abc')})
>>> ds['x'] = ('abc', [1., 2., 3.])
>>> ds['y'] = ('abc', [-1., -2., -3.])
>>> ds['t'] = ('abc', [True, False, True])
>>> ds
<xarray.Dataset>
Dimensions:  (abc: 3)
Coordinates:
  * abc      (abc) <U1 'a' 'b' 'c'
Data variables:
    x        (abc) float64 1.0 2.0 3.0
    y        (abc) float64 -1.0 -2.0 -3.0
    t        (abc) bool True False True
>>> xy4b = ds[['x', 'y']].sel(abc=~ds.t) * 10
>>> xy4b.combine_first(ds)
Out[14]: 
<xarray.Dataset>
Dimensions:  (abc: 3)
Coordinates:
  * abc      (abc) object 'a' 'b' 'c'
Data variables:
    x        (abc) float64 1.0 20.0 3.0
    y        (abc) float64 -1.0 -20.0 -3.0
    t        (abc) float64 1.0 0.0 1.0

Anything else we need to know?: No.

Environment:

Output of xr.show_versions() commit: None python: 3.7.8 (default, Jul 5 2020, 21:51:42) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.48-gentoo machine: x86_64 processor: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz byteorder: little LC_ALL: None LANG: nl_BE.UTF-8 LOCALE: nl_BE.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.1 xarray: 0.12.1 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 1.2.0 distributed: None matplotlib: 3.2.1 cartopy: None seaborn: None setuptools: 46.4.0 pip: 20.0.2 conda: None pytest: None IPython: 7.16.1 sphinx: 3.0.4
equaeghe commented 2 years ago

Issue still present in recent xarray.

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.9 (main, Jan 9 2022, 21:37:30) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.26-gentoo-a machine: x86_64 processor: AMD Ryzen 7 PRO 4750U with Radeon Graphics byteorder: little LC_ALL: None LANG: nl_NL.UTF-8 LOCALE: ('nl_NL', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.8.1 xarray: 0.21.1 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2022.02.0 distributed: None matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 60.9.2 pip: 22.0.3 conda: None pytest: None IPython: 7.31.1 sphinx: 4.4.0 /usr/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
kmuehlbauer commented 1 year ago

combine_first uses fillna under the hood -> #3570