thouis / numpy-trac-migration

numpy Trac to github issues migration
2 stars 3 forks source link

.max(0) on reshaped array returns inconsistent results. (Trac #2144) #5938

Open numpy-gitbot opened 11 years ago

numpy-gitbot commented 11 years ago

Original ticket http://projects.scipy.org/numpy/ticket/2144 on 2012-05-25 by atmention:thouis, assigned to unknown.

The following code fails on the current master (7a254bd)

import numpy as np

b = np.array([0, 1, 2, 3, 4, 5], np.int64)
a = b.reshape(3, 2)

while True:
    np.testing.assert_array_equal(np.atleast_1d(np.array(a.max(0), np.float)),
                                  np.atleast_1d(np.array(a.max(0), np.float)))

This is on OSX 10.6.8, numpy compiled with gcc 4.0.1

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.version.version
'1.7.0.dev-7a254bd'
numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-25

Also, this is on x86_64. Running under i386, it does not seem to crash.

numpy-gitbot commented 11 years ago

atmention:mwiebe wrote on 2012-05-25

I've tried this on both windows and linux 64-bit, and unfortunately couldn't reproduce it.

{{

np.version '1.7.0.dev-3f45eaa' }}

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-25

This seems to exercise it more consistently for me than the script I posted:

while True; do python -m nose.core ../numpy.bisect/numpy/lib/tests/test_function_base.py:TestHistogramdd --pdb --pdb-failures; done

I've git bisected it down to this https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1

numpy-gitbot commented 11 years ago

atmention:mwiebe wrote on 2012-05-25

The function which should be doing the initialization which seems to be failing is here in the "else" case:

https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2580

I've tried it on a Mac with 64-bit Python 2.7.1, built with llvm-gcc 4.2.1, but it didn't reproduce there either.

Can you try a few other reductions that trigger the same code-path? Instead of "a.max(0)", try "np.maximum.reduce(a, axis=0)"? The functions np.minimum.reduce, np.fmin.reduce, np.fmax.reduce should trigger it too.

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-26

I think I've traced it down to this line being executed at the last loop of the iteration it's controlling:

https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2896

while (iternext(iter));

Before this line executes, the data is correct. Afterward, it has been scribbled over with something that looks suspiciously like a pointer's value.

The call to iternext at the end of the loop calls this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_templ.c.src#L252

npyiter_copy_from_buffers(iter);

Which only when it crashes, goes into this piece of code: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1878

and executes this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1997

I'm reasonably certain it shouldn't be going into the if clause based on delta at line 1878, but I haven't unraveled enough of that code to know for sure.

However, changing the second <= to < at this line (since it looks like an off-by-one error, and every time it crashed, it was under the == case of <=): https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1872

seems to fix the crash. Mark, can you verify that my intuition about this comparison is correct?

All tests pass with this change, except one:

FAIL: test_where_param_buffer_output (test_ufunc.TestUfunc)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/tjones/numpy.git/numpy/core/tests/test_ufunc.py", line 598, in test_where_param_buffer_output
    assert_equal(c, [2,1.5,1.5,2,1.5,1.5,2,2,2,1.5])
  File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 256, in assert_equal
    return assert_array_equal(actual, desired, err_msg, verbose)
  File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 753, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 677, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 50.0%)
 x: array([ 1.5,  1.5,  1.5,  1.5,  1.5,  1.5,  1.5,  1.5,  1.5,  1.5])
 y: array([ 2. ,  1.5,  1.5,  2. ,  1.5,  1.5,  2. ,  2. ,  2. ,  1.5])
numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-26

Based on some more testing, I expect that there is no off-by-one error, but rather there's an assumption that the buffer is not adjacent to the destination (assuming I'm reading the code correctly).

For the newly failing test, this is what some simple instrumenting reports:

test_where_param_buffer_output (test_ufunc.TestUfunc) ... Buffer allocated: 10294f670
Delta 80 compared to 80
FAIL
...

Perhaps the correct approach is to allocate the buffer with some "fencepost" space (8 bytes for alignment?) at the beginning, and use it with an offset to ensure that the assumption can't be violated.

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-26

This seems to have fixed it. I don't know if this is the best approach, or if it would be better to set a flag somewhere that indicates that data needs copying. This would seem safer than pointer math, but I'm not familiar enough with the npyiter code to know if that's a possible solution.

If someone could review this, I can submit a PR. https://github.com/thouis/numpy/commit/a86a828e8f7e63cda926be031cbfc35d49ee820e

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-26

I missed the buffer allocation in the previous change. This adds that.

https://github.com/thouis/numpy/commit/69cef4b60608674603918e39b047ea43ac3714c0

numpy-gitbot commented 11 years ago

atmention:njsmith wrote on 2012-05-26

Wow, that's a tricky bug. Nice analysis.

Not knowing anything about this code, I'm very dubious about the idea that adding some guard data is actually the right solution, though, as opposed to teaching the code to know where the actual boundaries of its buffers are...

numpy-gitbot commented 11 years ago

Milestone changed to NumPy 1.7 by atmention:njsmith on 2012-05-26

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-26

I agree, it's a less-good solution than something more obvious (to the code's author) and direct. Perhaps a flag somewhere within the iterator that's set if the buffer needs to be copied.

numpy-gitbot commented 11 years ago

atmention:mwiebe wrote on 2012-05-28

Great work tracking down this tricky bug! The flag idea you suggested seems like the best approach to fix it, I've implemented this in a branch here:

https://github.com/mwiebe/numpy/tree/nditer_buffer_flag

Can you verify this fixes your test case?

Thanks!

numpy-gitbot commented 11 years ago

atmention:thouis wrote on 2012-05-29

Yes, this seems to have fixed the issue for me.

numpy-gitbot commented 11 years ago

atmention:njsmith wrote on 2012-06-06

Merged in [de8c5368], so this is fixed.

However, trac won't let me close this bug.