pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.49k stars 1.04k forks source link

assert_identical fails to generate diff when an array is an attribute value #9153

Closed DocOtak closed 2 days ago

DocOtak commented 1 week ago

What happened?

I'm writing a bunch of tests for one of my code bases that checks that the output of some operation done on a dataset has the expected results. As such, I'm using xarray.testing. assert_identical in these tests. I discovered that the assertion would fail occasionally with a ValueError rather than the expected AssertionError. Poking at different combinations of inputs, it appears to only fail when comparing a Dataset with non identical DataArrays that both contain an attribute that isn't comparable with ==

What did you expect to happen?

An AssertionError to be raised with the appropriate diff.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
xr.testing.assert_identical(ds1, ds2)

MVCE confirmation

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 6
      4 ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
      5 ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
----> 6 xr.testing.assert_identical(ds1, ds2)

    [... skipping hidden 2 frame]

File ~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/core/formatting.py:974, in diff_dataset_repr(a, b, compat)
    971 summary.append(diff_dim_summary(a, b))
    972 summary.append(diff_coords_repr(a.coords, b.coords, compat, col_width=col_width))
    973 summary.append(
--> 974     diff_data_vars_repr(a.data_vars, b.data_vars, compat, col_width=col_width)
    975 )
    977 if compat == "identical":
    978     summary.append(diff_attrs_repr(a.attrs, b.attrs, compat))

File ~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/core/formatting.py:824, in _diff_mapping_repr(a_mapping, b_mapping, compat, title, summarizer, col_width, a_indexes, b_indexes)
    820 b_attrs = b_mapping[k].attrs
    822 attrs_to_print = set(a_attrs) ^ set(b_attrs)
    823 attrs_to_print.update(
--> 824     {k for k in set(a_attrs) & set(b_attrs) if a_attrs[k] != b_attrs[k]}
    825 )
    826 for m in (a_mapping, b_mapping):
    827     attr_s = "\n".join(
    828         "    " + summarize_attr(ak, av)
    829         for ak, av in m[k].attrs.items()
    830         if ak in attrs_to_print
    831     )

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Anything else we need to know?

Comparing the underlying DataArray objects works as expected:

import xarray as xr
import numpy as np

ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})})
xr.testing.assert_identical(ds1.t1, ds2.t1)
Traceback ```python --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[3], line 6 4 ds1 = xr.Dataset({"t1": xr.DataArray([1], attrs={"test": np.array([0,1,2,3], dtype="byte")})}) 5 ds2 = xr.Dataset({"t1": xr.DataArray([2], attrs={"test": np.array([0,1,2,3], dtype="byte")})}) ----> 6 xr.testing.assert_identical(ds1.t1, ds2.t1) [... skipping hidden 1 frame] File [~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/testing/assertions.py:215](http://localhost:8888/~/.dotfiles/pyenv/versions/3.12.3/envs/jupyter/lib/python3.12/site-packages/xarray/testing/assertions.py#line=214), in assert_identical(a, b, from_root) 213 elif isinstance(a, DataArray): 214 assert a.name == b.name --> 215 assert a.identical(b), formatting.diff_array_repr(a, b, "identical") 216 elif isinstance(a, (Dataset, Variable)): 217 assert a.identical(b), formatting.diff_dataset_repr(a, b, "identical") AssertionError: Left and right DataArray objects are not identical Differing values: L array([1]) R array([2]) ```

This potentially looks related to #3711

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 (main, May 23 2024, 16:39:17) [Clang 15.0.0 (clang-1500.3.9.4)] python-bits: 64 OS: Darwin OS-release: 23.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.6.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.9.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: None pip: 24.0 conda: None pytest: None mypy: None IPython: 8.24.0 sphinx: None