numba / numba

NumPy aware dynamic Python compiler using LLVM
https://numba.pydata.org/
BSD 2-Clause "Simplified" License
9.9k stars 1.12k forks source link

np.median raises AssertionError for empty arrays while numpy returns nan #9433

Open leoschwarz opened 8 months ago

leoschwarz commented 8 months ago

Reporting a bug

This is a very similar issue to #8451 but here it affects numpy.median and setting the error_model does not change the outcome.

import unittest
import numpy as np
from numba import njit

def median_numpy(arr):
    return np.median(arr)

@njit
def median_numba(arr):
    return np.median(arr)

class TestCase(unittest.TestCase):
    def test_median_numpy(self):
        self.assertTrue(np.isnan(median_numpy(np.array([], dtype=float))))
        self.assertTrue(np.isnan(median_numpy(np.array([]))))

    def test_median_numba(self):
        self.assertTrue(np.isnan(median_numba(np.array([], dtype=float))))
        self.assertTrue(np.isnan(median_numba(np.array([]))))

if __name__ == "__main__":
    unittest.main()

While numpy returns nan, numba raises an AssertionError:

Failure
Traceback (most recent call last):
  File "/Users/leo/code/msi/code/tests/unit/misc/test_numpy_util.py", line 88, in test_median_numba
    self.assertTrue(np.isnan(NumpyUtil.median_numba(np.array([], dtype=float))))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leo/.miniconda3/envs/exp-2023-10/lib/python3.11/site-packages/numba/np/arraymath.py", line 1556, in _select_two
    assert high > low  # by construction
AssertionError
kc611 commented 8 months ago

Thanks for reporting this. I can reproduce this locally.

The error is possibly due to lack of a guards around the internal implementation for np.median within Numba.

It originates from the following line:

https://github.com/numba/numba/blob/3edb458c69daeaaffbfc938145c367eda1cf043f/numba/np/arraymath.py#L1582

Where n = shape of the flattened array = 0 which makes the logic think that variable low equals 0 and variable high equals -1.

A possible fix is to simply check for this edge case and return nan if no elements exist within the array. Marking this as a good first issue.