numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
27.59k stars 9.88k forks source link

Simd test failure on 32 bit windows. #3680

Closed charris closed 11 years ago

charris commented 11 years ago
FAIL: simd tests on max/min
----------------------------------------------------------------------
Traceback (most recent call last):
   File "X:\Python27\lib\site-packages\numpy\core\tests\test_umath.py",
line 678, in test_minmax_blocked
     msg=repr(inp) + '\n' + msg)
AssertionError: array([  0.,   1.,  nan,   3.,   4.,   5.,   6.,   7.,
  8.,   9.,  10.], dtype=float32)
unary offset=(0, 0), size=11, dtype=<type 'numpy.float32'>, out of place
charris commented 11 years ago

@juliantaylor Can you take a look at this? I suspect is also shows up on 32 bit linux.

juliantaylor commented 11 years ago

linux 32 bit works, does it also happen in 1.7?

does windows enable SSE2 on 32 bit by default? if not it should use the same 1.7 code.

Can you test it by removing the run_simd.. line in _maximum in loops.c.src:1473

charris commented 11 years ago

This was reported by @cgohlke. I don't have windows set up at the moment.

cgohlke commented 11 years ago

Numpy-1.7.1-win32-py2.7 passes all tests. When removing loops.c.src:1473, the test passes.

juliantaylor commented 11 years ago

weird, why would windows enable sse on 32 bit machines by default, I though they were all about compatibility.

I can only guess where the error goes in. Something must reset the fpu invalid flag or the fpu flag support detection does not work properly on windows (which might explain #3681). I have no windows available, so can you put this in line 31 of numpy/core/src/umath/simd.inc.src:

#define NO_FLOATING_POINT_SUPPORT

to go into the fallback (slower) nan handling.

If it fixes it could just define that for windows until someone can debug properly where it loses the nan on windows. It would be easier over a real time chat please contact me if someone with windows wants to give it a try.

cgohlke commented 11 years ago

I recompiled with the /DNO_FLOATING_POINT_SUPPORT option and this test passes.

I agree that SSE should probably not be enabled by default on win32. It may break code on older computers.

I also ran the simd min/max test code outside the test suite and found that not all failures are deterministic. Sometimes np.min/max returns the nan, other times the floating point value. That could be the reason why sometimes, not always, other tests that depend on np.min/max fail (?). For example:

======================================================================
FAIL: test_combinations (test_multiarray.TestArgmax)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27\lib\site-packages\numpy\core\tests\test_multiarray.py", line 1730, in test_combinations
    assert_equal(arr[np.argmax(arr)], np.max(arr), err_msg="%r"%arr)
  File "X:\Python27\lib\site-packages\numpy\testing\utils.py", line 304, in assert_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal: [0, 1, 2, 3, nan]
 ACTUAL: nan
 DESIRED: 3.0

======================================================================
FAIL: test_combinations (test_multiarray.TestArgmin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "X:\Python27\lib\site-packages\numpy\core\tests\test_multiarray.py", line 1798, in test_combinations
    assert_equal(arr[np.argmin(arr)], np.min(arr), err_msg="%r"%arr)
  File "X:\Python27\lib\site-packages\numpy\testing\utils.py", line 304, in assert_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal: [0, 1, 2, 3, nan]
 ACTUAL: nan
 DESIRED: 0.0

With the NO_FLOATING_POINT_SUPPORT option these tests seem to always pass (but how can one be sure?).

juliantaylor commented 11 years ago

it randomly returns nan or not nan on the same data array? that can only be use of uninitialized memory, but I don't see an issue in the code (NO_FLOATING_POINT_SUPPORT would also be affected by that)

cgohlke commented 11 years ago

Yes, running the below code (taken from the test) several times with numpy-1.8dev-win32-py2.7 gives different results every time. For example: http://www.lfd.uci.edu/~gohlke/download/run1.txt http://www.lfd.uci.edu/~gohlke/download/run2.txt

As mentioned, using NO_FLOATING_POINT_SUPPORT fixes this and the code also runs correctly on numpy-1.8dev-win-amd64 and with numpy 1.7.

from __future__ import division, print_function

import numpy as np

def gen_alignment_data(dtype=np.float32, type='binary', max_size=24):
    ufmt = 'unary offset=(%d, %d), size=%d, dtype=%r, %s'
    bfmt = 'binary offset=(%d, %d, %d), size=%d, dtype=%r, %s'
    for o in range(3):
        for s in range(o + 2, max(o + 3, max_size)):
            if type == 'unary':
                inp = lambda : np.arange(s, dtype=dtype)[o:]
                out = np.empty((s,), dtype=dtype)[o:]
                yield out, inp(), ufmt % (o, o, s, dtype, 'out of place')
                yield inp(), inp(), ufmt % (o, o, s, dtype, 'in place')
                yield out[1:], inp()[:-1], ufmt % \
                    (o + 1, o, s - 1, dtype, 'out of place')
                yield out[:-1], inp()[1:], ufmt % \
                    (o, o + 1, s - 1, dtype, 'out of place')
                yield inp()[:-1], inp()[1:], ufmt % \
                    (o, o + 1, s - 1, dtype, 'aliased')
                yield inp()[1:], inp()[:-1], ufmt % \
                    (o + 1, o, s - 1, dtype, 'aliased')
            if type == 'binary':
                inp1 = lambda :np.arange(s, dtype=dtype)[o:]
                inp2 = lambda :np.arange(s, dtype=dtype)[o:]
                out = np.empty((s,), dtype=dtype)[o:]
                yield out, inp1(), inp2(),  bfmt % \
                    (o, o, o, s, dtype, 'out of place')
                yield inp1(), inp1(), inp2(), bfmt % \
                    (o, o, o, s, dtype, 'in place1')
                yield inp2(), inp1(), inp2(), bfmt % \
                    (o, o, o, s, dtype, 'in place2')
                yield out[1:], inp1()[:-1], inp2()[:-1], bfmt % \
                    (o + 1, o, o, s - 1, dtype, 'out of place')
                yield out[:-1], inp1()[1:], inp2()[:-1], bfmt % \
                    (o, o + 1, o, s - 1, dtype, 'out of place')
                yield out[:-1], inp1()[:-1], inp2()[1:], bfmt % \
                    (o, o, o + 1, s - 1, dtype, 'out of place')
                yield inp1()[1:], inp1()[:-1], inp2()[:-1], bfmt % \
                    (o + 1, o, o, s - 1, dtype, 'aliased')
                yield inp1()[:-1], inp1()[1:], inp2()[:-1], bfmt % \
                    (o, o + 1, o, s - 1, dtype, 'aliased')
                yield inp1()[:-1], inp1()[:-1], inp2()[1:], bfmt % \
                    (o, o, o + 1, s - 1, dtype, 'aliased')

print(np.__version__)

for dt in [np.float32, np.float64]:
    for out, inp, msg in gen_alignment_data(dtype=dt, type='unary',
                                            max_size=15):
        a = np.arange(inp.size, dtype=dt)
        for i in range(inp.size):
            inp[:] = a
            inp[i] = np.nan
            print(inp.max(), msg, inp)
charris commented 11 years ago

There should be a compiler warning for uninitialized variables. I suspect an out of bounds pointer. l wonder if some sse flags could reproduce this on 32 bit linux? Valgrind is another possibility.

cgohlke commented 11 years ago

I don't see any compiler warnings or other output during the build stage, only during config.

juliantaylor commented 11 years ago

its not necessarily uninitialized memory, maybe win32 malloc aligns to 8 byte so its random if it goes through the scalar peeling loops or the vectorized loops. 32bit linux is not affected also with sse enabled.

njsmith commented 11 years ago

At some tests that intentionally run the loop on unaligned arrays (a, a[1:], a[2:], ...) and see if that makes the failure deterministic?

On Fri, Sep 6, 2013 at 5:46 PM, Julian Taylor notifications@github.comwrote:

its not necessarily uninitialized memory, maybe win32 malloc aligns to 8 byte so its random if it goes through the scalar peeling loops or the vectorized loops. 32bit linux is not affected also with sse enabled.

— Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3680#issuecomment-23953605 .

juliantaylor commented 11 years ago

gen_alignment_data does intentionally create misaligned data, but if a[1:] or a is aligned is not checked so its not really non deterministic, if its broken it breaks, just at which iteration is undefined.

juliantaylor commented 11 years ago

debugging this is trivial, you just need to step through the assembler code (its only are about 10) and check which one drops or does not set the invalid flag, I just don't have windows to do that. on the other hand, even if we know which one does not behave, the workaround would still be just define NO_FLOATING_POINT_SUPPORT, so we could just do that unconditionally on windows.

charris commented 11 years ago

Both sounds like a good answer to me ;) We really need to set up a Windows box for developer use and testing and see how far we can go with the free MKL licenses that have been offered by Intel. The terms seem rather restrictive as I understand them, i.e., restricted to a single developer rather than the box, but maybe that can be negotiated.

njsmith commented 11 years ago

I suggest going ahead and defining NO_FLOATING_POINT_SUPPORT in the 1.8 release branch (but not master), so that this doesn't become a blocker, and we can always revisit that later if/when a real fix is found?

On Fri, Sep 6, 2013 at 6:50 PM, Julian Taylor notifications@github.comwrote:

debugging this is trivial, you just need to step through the assembler code (its only are about 10) and check which one drops or does not set the invalid flag, I just don't have windows to do that. on the other hand, even if we know which one does not behave, the workaround would still be just define NO_FLOATING_POINT_SUPPORT, so we could just do that unconditionally on windows.

— Reply to this email directly or view it on GitHubhttps://github.com/numpy/numpy/issues/3680#issuecomment-23957518 .

juliantaylor commented 11 years ago

@cgohlke #3691 should fix the issue, can you please verify it?

cgohlke commented 11 years ago

That seems to work. All tests pass on numpy-1.8.0.dev-49c22d3-win32-py2.7 with the patch. Looks like issue #3681 is gone too. Thank you!