ValueError raised when comparing numpy / pandas iterables

jamescooke commented 1 year ago

Context

Given a small test that uses a numpy array, working in a clean Python 3.11 virtual environment:

pip install pytest numpy
cat > test.py

import numpy as np

def test() -> None:
    stuff = np.array([1, 2, 3])

    assert all(stuff == 1)

^d

Test fails as expected.

pytest test.py

Error output from Pytest is clear (just test's failure block shown):

_________________________________________________ test __________________________________________________

    def test() -> None:
        stuff = np.array([1, 2, 3])

>       assert all(stuff == 1)
E       assert False
E        +  where False = all(array([1, 2, 3]) == 1)

test.py:6: AssertionError

Current behaviour

When pytest-icdiff is installed and tests run, then an error is shown when the test fails because a ValueError is raised.

pip install pytest-icdiff
pytest test.py

_________________________________________________ test __________________________________________________

    def test() -> None:
        stuff = np.array([1, 2, 3])

>       assert all(stuff == 1)

test.py:6:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
venv311/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:491: in _call_reprcompare
    custom = util._reprcompare(ops[i], each_obj[i], each_obj[i + 1])
venv311/lib/python3.11/site-packages/_pytest/assertion/__init__.py:141: in callbinrepr
    hook_result = ihook.pytest_assertrepr_compare(
venv311/lib/python3.11/site-packages/pluggy/_hooks.py:265: in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
venv311/lib/python3.11/site-packages/pluggy/_manager.py:80: in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

config = <_pytest.config.Config object at 0x7fddd1908450>, op = '==', left = array([1, 2, 3]), right = 1

    def pytest_assertrepr_compare(config, op, left, right):
        if op != "==":
            return

        try:
>           if abs(left + right) < 19999:
E           ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

venv311/lib/python3.11/site-packages/pytest_icdiff.py:22: ValueError

This is happening because the +, abs and > operations in abs(left + right) < 19999 do not result in a single value. Instead they are propagated across the array, yielding a new array:

In [1]: import numpy as np

In [2]: left = np.array([1, 2, 3])

In [3]: right = 1

In [4]: left + right
Out[4]: array([2, 3, 4])

In [5]: abs(left + right) > 19999
Out[5]: array([False, False, False])

Numpy then warns the user if an Array of bool is treated as a bool:

In [6]: bool(_)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 bool(_)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Expected behaviour

With pytest-icdiff installed, the original assert False where False = all(array([1, 2, 3]) == 1) test failure is shown - no ValueError is raised.

hjwp commented 1 year ago

this is a great bug report than you so much!

hjwp commented 1 year ago

do you want to have a go at fixing it? i'm thinking the fix would happen here:

https://github.com/hjwp/pytest-icdiff/blob/master/pytest_icdiff.py#L24

    except ValueError:
    # should change to
    except (TypeError, ValueError):

    ## also this whole section probably needs a comment.  eg
    try:
       # specialcase small numbers, let normal pytest handle the diff.
        if abs(left + right) < 19999:

ofc it'd be lovely to have a test first! otoh, adding numpy as a dependency is a big ask. i wonder if there's another way to reproduce a ValueError from doing that left + right? hmmm

jamescooke commented 1 year ago

this is a great bug report than you so much!

Thanks 😊

do you want to have a go at fixing it?

Yep - just finding the time 🙏🏻 . I agree with what I think you've said - handling the ValueError seems to solve the problem. I then went down the road of trying to build a failing test - but also agree that adding numpy is too much as a test dependency.

(I then went searching for a mock-numpy library like moto is to boto, but couldn't find anything... just libraries for building mock data with numpy / pandas 😞 )

My guess is I can build a fake numpy-like class and use it in a test... I'll give that a go. If successful, I'll get it into PR and if I fail, then I'll message back here with findings. 🤞🏻

jamescooke commented 1 year ago

Have managed to build a fake numpy array class and use it in a test to reproduce the raised ValueError, which has then been caught in the fix: #41

When the ValueError is caught, my guess is it's better to bail out with return and use pytest's default output, rather than to allow progress with pass and use pytest-icdiff.

That's because, personally, I didn't find the icdiff output in this scenario helpful. The == 1 isn't comparing the array() with 1 - it's projecting == 1 onto each element of the array. So instead of the output as this (screenshot for colours):

Screenshot from 2023-06-24 21-39-15

... what's probably required is a full data-diff, which is definitely out of scope for this lib. Something like:

>       assert all(stuff == 1)
E       assert False
E        +  where False = all([
E               True,
E               False,
E               False,
E           ])
E        +  where False = all([
E               1 == 1,
E               2 == 1,
E               3 == 1,
E           ])

test.py:6: AssertionError

Hope that all makes sense 🙏🏻

hjwp / pytest-icdiff