scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
806 stars 81 forks source link

fix: prevent exponential memory growth in UnionArray #3119

Closed jpivarski closed 1 month ago

jpivarski commented 1 month ago

I decided to just make UnionArray.simplified not allow lazy carries. It should always return correct results and it can only have a performance impact on wide RecordArrays that go through UnionArray.simplified (perhaps via ak.concatenate).

The new test would fail for the old code: for 5 loops, the NumpyArray length gets up to 32, and we assert <= 2.

I only had to change one test, which was hard-coded to a particular Form. It was for testing allow_noncanonical_form=True in ak.from_buffers. I think that feature is still being tested with the updated test.

@agoose77, am I missing anything here? Would you do anything differently?

agoose77 commented 1 month ago

@jpivarski I agree with your reasoning. I haven't given much more thought to alternative solutions, but we can always revisit this in future!

ianna commented 1 month ago

@jpivarski - it looks like the tests on Mac are segfaulting and not only for this PR. I'm checking locally.

I can confirm that I can reproduce the segfault locally on MacOS 11.6 with pyarrow 16.1.0 (released yesterday). It happens in tests/test_0080_flatpandas_multiindex_rows_and_columns.py

Changing back to pyarrow 16.0.0 fixes the issue.

jpivarski commented 1 month ago

Although we could hold back pyarrow<16.1.0, that would have to be temporary; we'd have to remove the constraint after pyarrow gets fixed. We can only expect pyarrow to get fixed if the error is reported to them, so I tried to do that, but #3122.

ianna commented 1 month ago

Although we could hold back pyarrow<16.1.0, that would have to be temporary; we'd have to remove the constraint after pyarrow gets fixed. We can only expect pyarrow to get fixed if the error is reported to them, so I tried to do that, but #3122.

I uninstalled both awkward and awkward-cpp and have a minimum reproducer of a segfault with:

% python
Python 3.11.5 (main, Sep 11 2023, 08:19:27) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
zsh: segmentation fault  python

The test_pyarrow.py has only one line: import pyarrow

 % lldb python test_pyarrow.py 
(lldb) target create "python"
Current executable set to '/Users/yana/anaconda3/bin/python' (x86_64).
(lldb) settings set -- target.run-args  "test_pyarrow.py"
(lldb) run
Process 34536 launched: '/Users/yana/anaconda3/bin/python' (x86_64)
Process 34536 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000129c8685e libarrow.1601.dylib`malloc_conf_init_helper + 302
libarrow.1601.dylib`malloc_conf_init_helper:
->  0x129c8685e <+302>: movzbl (%rbx), %eax
    0x129c86861 <+305>: testb  %al, %al
    0x129c86863 <+307>: je     0x129c8936e               ; <+11326>
    0x129c86869 <+313>: movq   %rbx, %rcx
(lldb) 
jpivarski commented 1 month ago

That reproducer must depend on the environment because it didn't segfault immediately on import for me. I also noticed in the output that this is x86-64. My laptop is ARM (M2). That's likely relevant.

Could you send as much information as you can to the Arrow developers? Since we also need a short-term solution, we can put a version cap on Arrow in our tests for now. There are a few pull requests that need it.

ianna commented 1 month ago

That reproducer must depend on the environment because it didn't segfault immediately on import for me. I also noticed in the output that this is x86-64. My laptop is ARM (M2). That's likely relevant.

Could you send as much information as you can to the Arrow developers? Since we also need a short-term solution, we can put a version cap on Arrow in our tests for now. There are a few pull requests that need it.

I've opened an issue https://github.com/apache/arrow/issues/41696

jpivarski commented 1 month ago

Awesome! It looks like they're dealing with it right away.