Open thouis opened 11 years ago
Comment in Trac by atmention:thouis, 2012-05-25
Also, this is on x86_64. Running under i386, it does not seem to crash.
Comment in Trac by atmention:mwiebe, 2012-05-25
I've tried this on both windows and linux 64-bit, and unfortunately couldn't reproduce it.
{{
np.version '1.7.0.dev-3f45eaa' }}
Comment in Trac by atmention:thouis, 2012-05-25
This seems to exercise it more consistently for me than the script I posted: while True; do python -m nose.core ../numpy.bisect/numpy/lib/tests/test_function_base.py:TestHistogramdd --pdb --pdb-failures; done
I've git bisected it down to this commit: https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1
Comment in Trac by atmention:mwiebe, 2012-05-25
The function which should be doing the initialization which seems to be failing is here in the "else" case:
https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2580
I've tried it on a Mac with 64-bit Python 2.7.1, built with llvm-gcc 4.2.1, but it didn't reproduce there either.
Can you try a few other reductions that trigger the same code-path? Instead of "a.max(0)", try "np.maximum.reduce(a, axis=0)"? The functions np.minimum.reduce, np.fmin.reduce, np.fmax.reduce should trigger it too.
Comment in Trac by atmention:thouis, 2012-05-26
I think I've traced it down to this line being executed at the last loop of the iteration it's controlling:
https://github.com/numpy/numpy/commit/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1#L11R2896 while (iternext(iter));
Before this line executes, the data is correct. Afterward, it has been scribbled over with something that looks suspiciously like a pointer's value.
The call to iternext at the end of the loop calls this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_templ.c.src#L252 npyiter_copy_from_buffers(iter);
Which only when it crashes, goes into this piece of code: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1878
and executes this: https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1997
I'm reasonably certain it shouldn't be going into the if clause based on delta at line 1878, but I haven't unraveled enough of that code to know for sure.
However, changing the second <= to < at this line (since it looks like an off-by-one error, and every time it crashed, it was under the == case of <=): https://github.com/numpy/numpy/blob/aed9925a9d5fe9a407d0ca2c65cb577116c4d0f1/numpy/core/src/multiarray/nditer_api.c#L1872
seems to fix the crash. Mark, can you verify that my intuition about this comparison is correct?
Traceback (most recent call last):
File "/Users/tjones/numpy.git/numpy/core/tests/test_ufunc.py", line 598, in test_where_param_buffer_output
assert_equal(c, [2,1.5,1.5,2,1.5,1.5,2,2,2,1.5])
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 256, in assert_equal
return assert_array_equal(actual, desired, err_msg, verbose)
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 753, in assert_array_equal
verbose=verbose, header='Arrays are not equal')
File "/Users/tjones/numpy.git/numpy/testing/utils.py", line 677, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Arrays are not equal
(mismatch 50.0%)
x: array([ 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5])
y: array([ 2. , 1.5, 1.5, 2. , 1.5, 1.5, 2. , 2. , 2. , 1.5])
Comment in Trac by atmention:thouis, 2012-05-26
Based on some more testing, I expect that there is no off-by-one error, but rather there's an assumption that the buffer is not adjacent to the destination (assuming I'm reading the code correctly).
For the newly failing test, this is what some simple instrumenting reports:
test_where_param_buffer_output (test_ufunc.TestUfunc) ... Buffer allocated: 10294f670
Delta 80 compared to 80
FAIL
...
Perhaps the correct approach is to allocate the buffer with some "fencepost" space (8 bytes for alignment?) at the beginning, and use it with an offset to ensure that the assumption can't be violated.
Comment in Trac by atmention:thouis, 2012-05-26
This seems to have fixed it. I don't know if this is the best approach, or if it would be better to set a flag somewhere that indicates that data needs copying. This would seem safer than pointer math, but I'm not familiar enough with the npyiter code to know if that's a possible solution.
If someone could review this, I can submit a PR. https://github.com/thouis/numpy/commit/a86a828e8f7e63cda926be031cbfc35d49ee820e
Comment in Trac by atmention:thouis, 2012-05-26
I missed the buffer allocation in the previous change. This adds that.
https://github.com/thouis/numpy/commit/69cef4b60608674603918e39b047ea43ac3714c0
Comment in Trac by atmention:njsmith, 2012-05-26
Wow, that's a tricky bug. Nice analysis.
Not knowing anything about this code, I'm very dubious about the idea that adding some guard data is actually the right solution, though, as opposed to teaching the code to know where the actual boundaries of its buffers are...
Comment in Trac by atmention:thouis, 2012-05-26
I agree, it's a less-good solution than something more obvious (to the code's author) and direct. Perhaps a flag somewhere within the iterator that's set if the buffer needs to be copied.
Comment in Trac by atmention:mwiebe, 2012-05-28
Great work tracking down this tricky bug! The flag idea you suggested seems like the best approach to fix it, I've implemented this in a branch here:
https://github.com/mwiebe/numpy/tree/nditer_buffer_flag
Can you verify this fixes your test case?
Thanks!
Comment in Trac by atmention:thouis, 2012-05-29
Yes, this seems to have fixed the issue for me.
Comment in Trac by atmention:njsmith, 2012-06-06
Merged in [de8c5368], so this is fixed.
However, trac won't let me close this bug.
Original ticket http://projects.scipy.org/numpy/ticket/2144 Reported 2012-05-25 by atmention:thouis, assigned to unknown.
The following code fails on the current master (7a254bd)
This is on OSX 10.6.8, numpy compiled with gcc 4.0.1 Python 2.7 (r27:82508, Jul 3 2010, 21:12:11) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information.