scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
837 stars 87 forks source link

CUDA test failure: `tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13`, but only the _first time_ it is run #3214

Closed jpivarski closed 1 month ago

jpivarski commented 2 months ago

Version of Awkward Array

HEAD

Description and code to reproduce

The first time I ran pytest tests-cuda on a new system, I got one test failure:

=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 694 items                                                                                                                                

tests-cuda/test_1276_cuda_num.py .........                                                                                                   [  1%]
tests-cuda/test_1276_cuda_transfers.py ................                                                                                      [  3%]
tests-cuda/test_1276_cupy_interop.py .                                                                                                       [  3%]
tests-cuda/test_1276_from_cupy.py .....                                                                                                      [  4%]
tests-cuda/test_1300_same_for_numba_cuda.py .......................                                                                          [  7%]
tests-cuda/test_1381_check_errors.py .                                                                                                       [  7%]
tests-cuda/test_1809_array_cuda_jit.py ..............                                                                                        [  9%]
tests-cuda/test_2327_array_interface.py .                                                                                                    [ 10%]
tests-cuda/test_2649_dlpack_support.py .                                                                                                     [ 10%]
tests-cuda/test_2922a_new_cuda_kernels.py ......................................................................                             [ 20%]
tests-cuda/test_2922b_new_cuda_kernels.py .............................                                                                      [ 24%]
tests-cuda/test_3065a_cuda_kernels.py ...................................................................................................... [ 39%]
.....................................................................................................................................        [ 58%]
tests-cuda/test_3065b_cuda_kernels.py ........................                                                                               [ 61%]
tests-cuda/test_3065c_cuda_kernels.py ...................................................                                                    [ 69%]
tests-cuda/test_3086_cuda_concatenate.py ....................                                                                                [ 72%]
tests-cuda/test_3115_array_typed_cuda_jit.py .                                                                                               [ 72%]
tests-cuda/test_3130_cuda_listarray_getitem_next.py ................                                                                         [ 74%]
tests-cuda/test_3136_cuda_argmin_and_argmax.py sssssss                                                                                       [ 75%]
tests-cuda/test_3136_cuda_reducers.py ..................                                                                                     [ 78%]
tests-cuda/test_3140_cuda_jagged_and_masked_getitem.py ..........................                                                            [ 81%]
tests-cuda/test_3140_cuda_slicing.py ....................                                                                                    [ 84%]
tests-cuda/test_3141_cuda_misc.py ......                                                                                                     [ 85%]
tests-cuda/test_3149_complex_reducers.py ......................F.........ssss                                                                [ 90%]
tests-cuda/test_3150_combinations_n_equal_2.py .....................                                                                         [ 93%]
tests-cuda/test_3162_block_boundary_reducers.py ......ss....                                                                                 [ 95%]
tests-cuda/test_3162_cuda_generic_reducer_operation.py .......................s.......                                                       [100%]

===================================================================== FAILURES =====================================================================
________________________________________________________ test_block_boundary_prod_complex13 ________________________________________________________

    def test_block_boundary_prod_complex13():
        np.random.seed(42)
        array = np.random.randint(50, size=1000)
        complex_array = np.vectorize(complex)(
            array[0 : len(array) : 2], array[1 : len(array) : 2]
        )
        content = ak.contents.NumpyArray(complex_array)
        cuda_content = ak.to_backend(content, "cuda", highlevel=False)
        cpt.assert_allclose(
            ak.prod(cuda_content, -1, highlevel=False),
            ak.prod(content, -1, highlevel=False),
        )

        offsets = ak.index.Index64(np.array([0, 5, 996, 1000], dtype=np.int64))
        depth1 = ak.contents.ListOffsetArray(offsets, content)
        cuda_depth1 = ak.to_backend(depth1, "cuda", highlevel=False)
>       cpt.assert_allclose(
            to_list(ak.prod(cuda_depth1, -1, highlevel=False)),
            to_list(ak.prod(depth1, -1, highlevel=False)),
        )

array      = array([38, 28, 14, 42,  7, 20, 38, 18, 22, 10, 10, 23, 35, 39, 23,  2, 21,
        1, 23, 43, 29, 37,  1, 20, 32, 11, ...19, 24,  3,  9,  2, 40, 44, 17, 46, 35, 46, 21, 33, 46,
        7, 39, 48, 43, 18, 41, 40, 36,  5, 25, 33, 44,  5, 36])
complex_array = array([38.+28.j, 14.+42.j,  7.+20.j, 38.+18.j, 22.+10.j, 10.+23.j,
       35.+39.j, 23. +2.j, 21. +1.j, 23.+43.j, 29.+...7.j, 46.+35.j, 46.+21.j,
       33.+46.j,  7.+39.j, 48.+43.j, 18.+41.j, 40.+36.j,  5.+25.j,
       33.+44.j,  5.+36.j])
content    = <NumpyArray dtype='complex128' len='500'>
    [38.+28.j 14.+42.j  7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j  7.+39.j 48.+43.j 18.+41.j
     40.+36.j  5.+25.j 33.+44.j  5.+36.j]
</NumpyArray>
cuda_content = <NumpyArray dtype='complex128' len='500'>
    [38.+28.j 14.+42.j  7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j  7.+39.j 48.+43.j 18.+41.j
     40.+36.j  5.+25.j 33.+44.j  5.+36.j]
</NumpyArray>
cuda_depth1 = <ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [   0    5  996 1000]
    </Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
          5.+25.j 33.+44.j  5.+36.j]
    </NumpyArray></content>
</ListOffsetArray>
depth1     = <ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [   0    5  996 1000]
    </Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
          5.+25.j 33.+44.j  5.+36.j]
    </NumpyArray></content>
</ListOffsetArray>
offsets    = <Index dtype='int64' len='4'>
    [   0    5  996 1000]
</Index>

tests-cuda/test_3149_complex_reducers.py:575: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../miniforge3/lib/python3.11/site-packages/cupy/testing/_array.py:24: in assert_allclose
    numpy.testing.assert_allclose(
        actual     = [(-29843744-33672352j), (nan+nanj), 0j]
        atol       = 0
        desired    = [(-29843744-33672352j), (nan+nanj), (1.4641000000000006-0j)]
        err_msg    = ''
        rtol       = 1e-07
        verbose    = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j,        nan      +nanj,
 ...      0.       +0.j]), array([-2.9843744e+07-33672352.j,            nan      +nanj,
        1.4641000e+00       -0.j]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=0
E           
E           Mismatched elements: 1 / 3 (33.3%)
E           Max absolute difference: 1.4641
E           Max relative difference: 1.
E            x: array([-29843744.-33672352.j,        nan      +nanj,
E                          0.       +0.j])
E            y: array([-2.984374e+07-33672352.j,           nan      +nanj,
E                   1.464100e+00       -0.j])

args       = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j,        nan      +nanj,
 ...      0.       +0.j]), array([-2.9843744e+07-33672352.j,            nan      +nanj,
        1.4641000e+00       -0.j]))
func       = <function assert_array_compare at 0x7072f1849e40>
kwds       = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}
self       = <contextlib._GeneratorContextManager object at 0x7072f1870790>

../../miniforge3/lib/python3.11/contextlib.py:81: AssertionError
============================================================= short test summary info ==============================================================
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:18: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:40: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:51: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:115: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:138: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [2] tests-cuda/test_3136_cuda_argmin_and_argmax.py:177: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:773: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:795: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:817: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:839: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:121: awkward_reduce_argmin is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:139: awkward_reduce_argmax is not implemented
SKIPPED [1] tests-cuda/test_3162_cuda_generic_reducer_operation.py:847: awkward_reduce_argmin is not implemented
FAILED tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13 - AssertionError: 
==================================================== 1 failed, 679 passed, 14 skipped in 21.50s ====================================================

The error is that the numerical result for this array is wrong.

Subsequently re-running this test did not result in any errors. That's very strange. I tried to make a reproducer on Google Colab, but couldn't install CuPy on it.

I also tried uninstalling and reinstalling Awkward:

% pip uninstall awkward 
Found existing installation: awkward 2.6.7
Uninstalling awkward-2.6.7:
  Would remove:
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/_awkward.pth
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward-2.6.7.dist-info/*
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward/juliapkg.json
Proceed (Y/n)? 
  Successfully uninstalled awkward-2.6.7
% pip install -e .
Obtaining file:///home/jpivarski/irishep/awkward
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: awkward-cpp==37 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (37)
Requirement already satisfied: fsspec>=2022.11.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (2024.6.1)
Requirement already satisfied: importlib-metadata>=4.13.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (8.2.0)
Requirement already satisfied: numpy>=1.18.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (1.26.4)
Requirement already satisfied: packaging in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (24.1)
Requirement already satisfied: zipp>=0.5 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->awkward==2.6.7) (3.20.0)
Building wheels for collected packages: awkward
  Building editable for awkward (pyproject.toml) ... done
  Created wheel for awkward: filename=awkward-2.6.7-py3-none-any.whl size=5067 sha256=0ddf47f970c3ab51619d8a5d6b0072a315f422f3883f77c4465ad3900915dd27
  Stored in directory: /tmp/pip-ephem-wheel-cache-acpm0m7u/wheels/56/e1/a6/2c4dae09851e882a1c0d9a375beb305bc10de51cda49eccf35
Successfully built awkward
Installing collected packages: awkward
Successfully installed awkward-2.6.7
% pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13
=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 1 item                                                                                                                                   

tests-cuda/test_3149_complex_reducers.py .                                                                                                   [100%]

================================================================ 1 passed in 3.69s =================================================================

But that didn't do it.

Maybe this has nothing to do with being the first time, and it's a very rare synchronization bug.

I tried running it 100 times:

for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13; done; done

but that didn't do it—it still passes.

@ianna, if you can't reproduce it, just close this issue.

ianna commented 2 months ago

@jpivarski - yes, I can reproduce it. It did not fail when firstly the single test was run separately as:

python -m pytest tests-cuda/test_3149_complex_reducers.py

but it did fail when it was run as a full set of tests:

python -m pytest tests-cuda

ianna commented 1 month ago

fixed in https://github.com/scikit-hep/awkward/pull/3235