pydata / sparse

Sparse multi-dimensional arrays for the PyData ecosystem
https://sparse.pydata.org
BSD 3-Clause "New" or "Revised" License
598 stars 124 forks source link

Segmentation fault on arm64 #628

Open tillea opened 9 months ago

tillea commented 9 months ago

Describe the bug When running the test suite on arm64 architecture Python3.11 segfaults.

To Reproduce The Debian continuous integration test is running on all Debian release architectures. While it passed for amd64 it fails on arm64 and other architectures. Feel free to check the full build log

Expected behavior The test suite should pass on all architectures.

System

Kind regards, Andreas.

hameerabbasi commented 9 months ago

This is likely a problem with Numba code generation -- do the tests pass with Py3.10 and below on other architectures?

tillea commented 9 months ago

The test used to pass with Py3.10. You can check the list of architectures including test logs on our CI page

hameerabbasi commented 9 months ago

Let me rephrase, do the tests pass with Python 3.11 and sparse 0.14, but Numba 0.57.1? How about Python 3.10, sparse 0.15.1 and Numba 0.57.1?

I unfortunately don't have access to an ARM64 machine, so I cannot debug this personally, and would rely on reporters to isolate the issue.

tillea commented 9 months ago

Am Wed, Jan 17, 2024 at 02:28:29AM -0800 schrieb Hameer Abbasi:

Let me rephrase, do the tests pass with Python 3.11 and sparse 0.14, but Numba 0.57.1? How about Python 3.10, sparse 0.15.1 and Numba 0.57.1?

Sorry, I'm not the maintainer of this package and dont have resources to test these cases. Kind regards, Andreas.

hameerabbasi commented 9 months ago

@mtsokol IIRC you had a Mac, is that Apple Silicon by any chance? Could you reproduce this bug with the software versions mentioned?

mtsokol commented 9 months ago

@mtsokol IIRC you had a Mac, is that Apple Silicon by any chance? Could you reproduce this bug with the software versions mentioned?

Unfortunately my Mac is an ancient MacBook Pro 2015 with Intel i7.

hameerabbasi commented 9 months ago

I've attempted to fix this in #634, please re-open if the issue isn't resolved.

tillea commented 6 months ago

Hi, (sorry, I do not find any re-open button) I tried tag 0.16.0a4 (not sure whether this is considered alpha??) and the problem persist. In addition I tried amd64 test which fails as well. Kind regards, Andreas.

hameerabbasi commented 6 months ago

@tillea I just tested locally, it doesn't fail for me in a Docker container -- You might want to look at https://github.com/numba/numba/issues/9109#issuecomment-2042747383 and backporting https://github.com/llvm/llvm-project/commit/2e1b838a889f9793d4bcd5dbfe10db9796b77143 to Debian's LLVM 14.

Relevant LLVM issue: https://github.com/llvm/llvm-project/issues/61402

detrout commented 6 months ago

Andreas asked me to help out with this bug as he has new Debian project leader responsibilities. I was slowly trying to help deal with the numba side of the problems, but fell behind on understanding the llvm fix. Currently I'm trying to the llvmlite maintainer to update llvmlite so I can release numba 0.59.1

hameerabbasi commented 6 months ago

@detrout Thank you for helping out -- Some background info from reading the Numba issue, it isn't an issue with Numba itself, but present in Debian's LLVM 14 (and release LLVM 14, IIUC). The reason it doesn't show up on Numba from PyPI or conda-forge is that they already have the LLVM patch applied in llvmlite on PyPI and LLVM 14 from conda-forge, which is why I think backporting the patch might help.

hameerabbasi commented 4 months ago

I recently ran the test suite on both an Apple Silicon Mac as well as multiple arm64-based containers trying to reproduce this, but the test suite ran fine. Can anyone, maybe @detrout, check what happens if llvmlite and numba are installed via PyPI instead of via apt? That would confirm a packaging issue, and would point to https://github.com/pydata/sparse/issues/628#issuecomment-2076366957 being a possible cause.