pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.5k stars 1.04k forks source link

`test_repr_multiindex*` and `test_array_repr_dtypes_unix` tests failing on 32-bit platforms #9127

Closed mgorny closed 2 weeks ago

mgorny commented 3 weeks ago

What happened?

When running the test suite on 32-bit platforms (e.g. x86), I'm getting the following test failures:

FAILED xarray/tests/test_dataarray.py::TestDataArray::test_repr_multiindex - ...
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_repr_multiindex_long
FAILED xarray/tests/test_dataset.py::TestDataset::test_repr_multiindex - Asse...
FAILED xarray/tests/test_formatting.py::test_array_repr_dtypes_unix - Asserti...

(log below)

I think the tests assume specific object sizes for 64-bit platforms.

This is Python 3.11.9 on 32-bit x86, retested on 380979fc213b3d1f53f097bab9b61851391be729.

What did you expect to happen?

Tests passing.

Minimal Complete Verifiable Example

# on a 32-bit architecture, literally:
python -m pytest

MVCE confirmation

Relevant log output

=================================== FAILURES ===================================
______________________ TestDataArray.test_repr_multiindex ______________________

self = <xarray.tests.test_dataarray.TestDataArray object at 0xe9e027f0>

    def test_repr_multiindex(self) -> None:
        expected = dedent(
            """\
            <xarray.DataArray (x: 4)> Size: 32B
            array([0, 1, 2, 3], dtype=uint64)
            Coordinates:
              * x        (x) object 32B MultiIndex
              * level_1  (x) object 32B 'a' 'a' 'b' 'b'
              * level_2  (x) int64 32B 1 2 1 2"""
        )
>       assert expected == repr(self.mda)
E       AssertionError: assert '<xarray.Data...4 32B 1 2 1 2' == '<xarray.Data...4 32B 1 2 1 2'
E         
E         Skipping 97 identical leading characters in diff, use -v to show
E         Skipping 43 identical trailing characters in diff, use -v to show
E         - x) object 16B MultiIndex
E         ?           ^^
E         + x) object 32B MultiIndex
E         ?           ^^...
E         
E         ...Full output truncated (4 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataarray.py:122: AssertionError
___________________ TestDataArray.test_repr_multiindex_long ____________________

self = <xarray.tests.test_dataarray.TestDataArray object at 0xe9e029d0>

    def test_repr_multiindex_long(self) -> None:
        mindex_long = pd.MultiIndex.from_product(
            [["a", "b", "c", "d"], [1, 2, 3, 4, 5, 6, 7, 8]],
            names=("level_1", "level_2"),
        )
        mda_long = DataArray(
            list(range(32)), coords={"x": mindex_long}, dims="x"
        ).astype(np.uint64)
        expected = dedent(
            """\
            <xarray.DataArray (x: 32)> Size: 256B
            array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
                   17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
                  dtype=uint64)
            Coordinates:
              * x        (x) object 256B MultiIndex
              * level_1  (x) object 256B 'a' 'a' 'a' 'a' 'a' 'a' ... 'd' 'd' 'd' 'd' 'd' 'd'
              * level_2  (x) int64 256B 1 2 3 4 5 6 7 8 1 2 3 4 ... 5 6 7 8 1 2 3 4 5 6 7 8"""
        )
>       assert expected == repr(mda_long)
E       AssertionError: assert '<xarray.Data...2 3 4 5 6 7 8' == '<xarray.Data...2 3 4 5 6 7 8'
E         
E         Skipping 228 identical leading characters in diff, use -v to show
E         Skipping 124 identical trailing characters in diff, use -v to show
E         - x) object 128B MultiIndex
E         ?           - ^
E         + x) object 256B MultiIndex
E         ?            ^^...
E         
E         ...Full output truncated (4 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataarray.py:143: AssertionError
_______________________ TestDataset.test_repr_multiindex _______________________

self = <xarray.tests.test_dataset.TestDataset object at 0xe9d08170>

    def test_repr_multiindex(self) -> None:
        data = create_test_multiindex()
        expected = dedent(
            """\
            <xarray.Dataset> Size: 96B
            Dimensions:  (x: 4)
            Coordinates:
              * x        (x) object 32B MultiIndex
              * level_1  (x) object 32B 'a' 'a' 'b' 'b'
              * level_2  (x) int64 32B 1 2 1 2
            Data variables:
                *empty*"""
        )
        actual = "\n".join(x.rstrip() for x in repr(data).split("\n"))
        print(actual)
>       assert expected == actual
E       AssertionError: assert '<xarray.Data...\n    *empty*' == '<xarray.Data...\n    *empty*'
E         
E         Skipping 71 identical trailing characters in diff, use -v to show
E         - <xarray.Dataset> Size: 64B
E         ?                         -
E         + <xarray.Dataset> Size: 96B
E         ?                        +
E           Dimensions:  (x: 4)...
E         
E         ...Full output truncated (9 lines hidden), use '-vv' to show

/tmp/xarray/xarray/tests/test_dataset.py:351: AssertionError
----------------------------- Captured stdout call -----------------------------
<xarray.Dataset> Size: 64B
Dimensions:  (x: 4)
Coordinates:
  * x        (x) object 16B MultiIndex
  * level_1  (x) object 16B 'a' 'a' 'b' 'b'
  * level_2  (x) int64 32B 1 2 1 2
Data variables:
    *empty*
_________________________ test_array_repr_dtypes_unix __________________________

    @pytest.mark.skipif(
        ON_WINDOWS,
        reason="Default numpy's dtypes vary according to OS",
    )
    def test_array_repr_dtypes_unix() -> None:

        # Signed integer dtypes

        ds = xr.DataArray(np.array([0]), dims="x")
        actual = repr(ds)
        expected = """
    <xarray.DataArray (x: 1)> Size: 8B
    array([0])
    Dimensions without coordinates: x
            """.strip()
>       assert actual == expected
E       AssertionError: assert '<xarray.Data...oordinates: x' == '<xarray.Data...oordinates: x'
E         
E         Skipping 37 identical trailing characters in diff, use -v to show
E         - <xarray.DataArray (x: 1)> Size: 8B
E         ?                                 ^
E         + <xarray.DataArray (x: 1)> Size: 4B
E         ?                                 ^
E           array([

/tmp/xarray/xarray/tests/test_formatting.py:1090: AssertionError

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: 380979fc213b3d1f53f097bab9b61851391be729 python: 3.11.9 (main, May 6 2024, 20:29:08) [GCC 13.2.1 20240210] python-bits: 32 OS: Linux OS-release: 6.9.4-gentoo-dist machine: x86_64 processor: AMD Ryzen 5 3600 6-Core Processor byteorder: little LC_ALL: None LANG: C.UTF8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.6.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: None pydap: None h5netcdf: None h5py: None zarr: None cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: 1.4.0rc5 dask: None distributed: None matplotlib: 3.9.0 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.5.1 pip: None conda: None pytest: 8.2.2 mypy: None IPython: None sphinx: None
max-sixty commented 2 weeks ago

I see some of those tests are skipped on windows; possibly we should also check for whether the platform is 32 bit and skip on those?

mgorny commented 2 weeks ago

Either that, or adjusting expected sizes depending on the platform.

max-sixty commented 2 weeks ago

These are looking at the reprs, so adjusting the expectations isn't easy.

We would def take a PR to skip them based on the platform though!

mgorny commented 2 weeks ago

Does skipping based on the value of sys.maxsize sound about right? I think that's the simplest way of determining whether we're dealing with a 64-bit platform.

dcherian commented 2 weeks ago

That seems to be the recommended way

keewis commented 2 weeks ago

we can also try hard-coding the dtype for those variables. Since the difference is in the repr, the default dtype of the platform is not actually important

mgorny commented 2 weeks ago

Actually, fixing expectations doesn't seem that hard, so I'm going to try doing that first. If you don't like that, we can look into other solutions.