Open numpy-gitbot opened 12 years ago
atmention:thouis wrote on 2012-05-25
Also, this is on x86_64. Running under i386, it does not seem to crash.
atmention:mwiebe wrote on 2012-05-25
I've tried this on both windows and linux 64-bit, and unfortunately couldn't reproduce it.
{{
np.version '1.7.0.dev-3f45eaa' }}
atmention:thouis wrote on 2012-05-25
This seems to exercise it more consistently for me than the script I posted:
while True; do python -m nose.core ../numpy.bisect/numpy/lib/tests/test_function_base.py:TestHistogramdd --pdb --pdb-failures; done
I've git bisected it down to this https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1
atmention:mwiebe wrote on 2012-05-25
The function which should be doing the initialization which seems to be failing is here in the "else" case:
https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2580
I've tried it on a Mac with 64-bit Python 2.7.1, built with llvm-gcc 4.2.1, but it didn't reproduce there either.
Can you try a few other reductions that trigger the same code-path? Instead of "a.max(0)", try "np.maximum.reduce(a, axis=0)"? The functions np.minimum.reduce, np.fmin.reduce, np.fmax.reduce should trigger it too.
atmention:thouis wrote on 2012-05-26
I think I've traced it down to this line being executed at the last loop of the iteration it's controlling:
https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2896
while (iternext(iter));
Before this line executes, the data is correct. Afterward, it has been scribbled over with something that looks suspiciously like a pointer's value.
The call to iternext at the end of the loop calls this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_templ.c.src#L252
npyiter_copy_from_buffers(iter);
Which only when it crashes, goes into this piece of code: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1878
and executes this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1997
I'm reasonably certain it shouldn't be going into the if clause based on delta at line 1878, but I haven't unraveled enough of that code to know for sure.
However, changing the second <= to < at this line (since it looks like an off-by-one error, and every time it crashed, it was under the == case of <=): https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1872
seems to fix the crash. Mark, can you verify that my intuition about this comparison is correct?
All tests pass with this change, except one:
FAIL: test_where_param_buffer_output (test_ufunc.TestUfunc)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/tjones/numpy.git/numpy/core/tests/test_ufunc.py", line 598, in test_where_param_buffer_output
assert_equal(c, [2,1.5,1.5,2,1.5,1.5,2,2,2,1.5])
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 256, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 753, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 677, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
(mismatch 50.0%)
x: array([ 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5])
y: array([ 2. , 1.5, 1.5, 2. , 1.5, 1.5, 2. , 2. , 2. , 1.5])
atmention:thouis wrote on 2012-05-26
Based on some more testing, I expect that there is no off-by-one error, but rather there's an assumption that the buffer is not adjacent to the destination (assuming I'm reading the code correctly).
For the newly failing test, this is what some simple instrumenting reports:
test_where_param_buffer_output (test_ufunc.TestUfunc) ... Buffer allocated: 10294f670
Delta 80 compared to 80
FAIL
...
Perhaps the correct approach is to allocate the buffer with some "fencepost" space (8 bytes for alignment?) at the beginning, and use it with an offset to ensure that the assumption can't be violated.
atmention:thouis wrote on 2012-05-26
This seems to have fixed it. I don't know if this is the best approach, or if it would be better to set a flag somewhere that indicates that data needs copying. This would seem safer than pointer math, but I'm not familiar enough with the npyiter code to know if that's a possible solution.
If someone could review this, I can submit a PR. https://github.com/thouis/numpy/commit/a86a828e8f7e63cda926be031cbfc35d49ee820e
atmention:thouis wrote on 2012-05-26
I missed the buffer allocation in the previous change. This adds that.
https://github.com/thouis/numpy/commit/69cef4b60608674603918e39b047ea43ac3714c0
atmention:njsmith wrote on 2012-05-26
Wow, that's a tricky bug. Nice analysis.
Not knowing anything about this code, I'm very dubious about the idea that adding some guard data is actually the right solution, though, as opposed to teaching the code to know where the actual boundaries of its buffers are...
Milestone changed to NumPy 1.7
by atmention:njsmith on 2012-05-26
atmention:thouis wrote on 2012-05-26
I agree, it's a less-good solution than something more obvious (to the code's author) and direct. Perhaps a flag somewhere within the iterator that's set if the buffer needs to be copied.
atmention:mwiebe wrote on 2012-05-28
Great work tracking down this tricky bug! The flag idea you suggested seems like the best approach to fix it, I've implemented this in a branch here:
https://github.com/mwiebe/numpy/tree/nditer_buffer_flag
Can you verify this fixes your test case?
Thanks!
atmention:thouis wrote on 2012-05-29
Yes, this seems to have fixed the issue for me.
atmention:njsmith wrote on 2012-06-06
Merged in [de8c5368], so this is fixed.
However, trac won't let me close this bug.
Original ticket http://projects.scipy.org/numpy/ticket/2144 on 2012-05-25 by atmention:thouis, assigned to unknown.
The following code fails on the current master (7a254bd)
This is on OSX 10.6.8, numpy compiled with gcc 4.0.1