sklam commented 2 years ago

We are seeing a LLVM Assertion error occurring randomly in our build farm.

The error message is:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /path/to/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210

Earliest report is from gitter on July 15, 2022

The error can be triggered with the below script on bdb2384. The error usually occurs within 10 iteration.

!python setup.py build_ext --inplace 
c = 0
_exit_code = 0
tests = """
numba.tests.test_stencils.TestManyStencils.test_basic40
numba.tests.test_stencils.TestManyStencils.test_basic70
numba.tests.test_array_constants.TestConstantArray.test_too_big_to_freeze
numba.tests.test_array_manipulation.TestArrayManipulation.test_fill_diagonal_basic
""".split()
cmdarg = ' '.join(tests)
while _exit_code == 0 and c < 150:
    print(f"c={c}".center(80, '='))
    !NUMBA_OPT=0 python -m unittest -vb $cmdarg $cmdarg
    c += 1
    print(f"exit={_exit_code}")
    assert _exit_code == 0

The error occurs in both LLVM 11 and LLVM 14.

The current hypothesis is that the LLVM Runtimedyld is mishandling far jumps. To relate this to the reproducer above, the situation can be created by:

first JITing some stencil kernels, which tend to be large and esp. larger when OPT=0
allocating large amount of memory as in test_too_big_to_freeze (the compilation and execution bits in the tests can be commented out and it will still trigger the error)
JITing more array operations as in test_fill_diagonal_basic. The assertion error occurs here. The guess is that JITed code emitted for the stencil tests are reused here. The large allocation in between help make sure there is a gap/fragmentation in the memory space such that the fill_diagonal functions are JITed in somewhere far away.

Julia devs is pointing to a broken large code model in LLVM Runtimedyld for MachO aarch64. See https://github.com/JuliaLang/julia/issues/42295#issuecomment-1008427270, https://github.com/JuliaLang/julia/pull/43664.

Francyrad commented 2 years ago

Dear users, i'm trying to run a script for a python3 program and i get this error 9/10. My script works, but randomly. When the code is too long, sometimes the error appears, some times it doesn't. So i have to run the script multiple times in the hope that it arrives at an end. I paste you the error that i get:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210. zsh: abort python3 test2.py

i'm running in a macbookPro with M1Pro. I have no idea how to solve this error, i don't even have idea if i can really solve it or if it depends from llvm. Do you know something about that? Thanks in advance, i hope in an answer...

esc commented 1 year ago

Dear users, i'm trying to run a script for a python3 program and i get this error 9/10. My script works, but randomly. When the code is too long, sometimes the error appears, some times it doesn't. So i have to run the script multiple times in the hope that it arrives at an end. I paste you the error that i get:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210. zsh: abort python3 test2.py

i'm running in a macbookPro with M1Pro. I have no idea how to solve this error, i don't even have idea if i can really solve it or if it depends from llvm. Do you know something about that? Thanks in advance, i hope in an answer...

@Francyrad thank you for asking about this. You do indeed encounter the same error as reported in this issue. You may consider Numba (more precisely, LLVM) to be broken on M1. There is currently no known fix or workaround and we are not sure if this has been reported upstream to LLVM or if there is a fix in progress. IIRC @sklam also checked LLVM 14 and it appears as though this has not be fixed. My only remaining guess here would be to try to run your script in a docker container on the M1 using a linux-aarch64 docker image. Performance should not be too bad as the hardware will not be simulated in this case. Note however, that I am guessing at this and it may very well also not work.

TL:DR Running Numba on an M1 may cause the segfaults you see above and the only known workaround is to use different hardware.

Francyrad commented 1 year ago

@esc thank you for your answer, i hope someone will be able to fix that. Please, let me know when it will be fixed commenting this issue

Thank you again

esc commented 1 year ago

@esc thank you for your answer, i hope someone will be able to fix that. Please, let me know when it will be fixed commenting this issue

Thank you again

Yes, we hope so too, if you subscribe to this issue, you will receive updates regarding this quest.

sfc-gh-jhu commented 1 year ago

Hi, is there any update on this? I'm under Python3.9 and LLVM 11.1.0 and M1 mac, and am having the same issue right now when running multi-processing of a forecast model (AutoCES) under statsforecast package. I've tried to bootstrap dev versions of both numba (0.57.0.dev0+1257.gce69f3010) and llvmlite 0.40.0.dev0+70.ge6901e0) from github repos, and still failed and keep facing this issue.

It seems like the temporary fix by https://github.com/numba/numba/pull/8583 is not working for me.

I have other models tested without issues, but they're all with the numba in the backend to speed up the computing. The only difference that I can think of is this specific model using complex values rather than some real-number values.

With numba (0.46) and llvmlite(0.39), this exactly the same error is raised when running. However, with dev version of numba (0.57.0.dev0+1257.gce69f3010) and llvmlite 0.40.0.dev0+70.ge6901e0), basically the multiprocessing just stuck in the terminal without any errors raised. (But I'm pretty sure it's still the same issue)

Can anyone help here? Thanks @esc @sklam

Francyrad commented 1 year ago

I have still the issue. Sometimes I waste more of my time try to running my scripts instead of working

esc commented 1 year ago

It seems like the temporary fix by #8583 is not working for me.

8583 only disables the tests so that we can complete the test-suite and ship the package, so it won't actually help with the issue.

Can anyone help here? Thanks @esc @sklam

No, unfortunately not, there is no known workaround, it's broken in LLVM 11 and 14 (supported by next Numba/llvmlite release). I am not aware of anyone working on a fix at present, so your best bet for now will be to use non-M1/Apple silicon, i.e. change hardware. So sorry I don't have better news for you.

@sklam for reference, was this ever reported to the LLVM issue tracker and if so, can you post the issue ID please? Thank you.

iamlll commented 1 year ago

Just wanted to mention that I'm having the same issue on Mac M1, llvm-openmp 16.0.2 and llvmlite 0.40.0! I run into this issue when solving systems of PDEs using py-pde. I've subscribed to this issue and fingers crossed that it will get fixed in the near-future.

Francyrad commented 1 year ago

@iamlll another bug that I don't is that the parallelisation with OpenMP don't work with the following chips:

M1Pro, M1Max and M1Ultra

It works just with M1

Is there some llvm where is it possible to do some report?

esc commented 1 year ago

Just wanted to mention that I'm having the same issue on Mac M1, llvm-openmp 16.0.2 and llvmlite 0.40.0! I run into this issue when solving systems of PDEs using py-pde. I've subscribed to this issue and fingers crossed that it will get fixed in the near-future.

@iamlll The reason you are seeing this with llvmlite 0.40.0 is because it is based on LLVM 14 and that is indeed buggy.

Francyrad commented 1 year ago

buggy.

So how can we solve the problem With OpenMP?

sklam commented 1 year ago

This a problem of the LLVM JIT that we are using (MCJIT) and we need to migrate to OrcJIT (https://github.com/numba/llvmlite/pull/919) so we can use JitLink and hopefully that will fix it.

esc commented 1 year ago

buggy.

So how can we solve the problem With OpenMP?

This issue is about M1 LLVM Runtimedyld Invalid page reloc value assertion error -- you are inquiring about a different issue here. In order to keep the signal-to-noise low, please open a new issue with the OpenMP issues you are seeing, thank you!

mzient commented 1 year ago

The issue is not limited to Apple M1 or MacOS. We're seeing it on Neoverse-N1 running Ubuntu 20.04 ever since we've uprgraded to Numba 0.57. This is a server machine - and not just one. Unfortunately, we cannot downgrade Numba because we need CUDA 12.1 support.

Error message:

python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

System info:

uname -a: Linux <hostname_redacted> 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Mon Aug 8 18:51:21 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

cat /etc/os-release | grep PRETTY
  PRETTY_NAME="Ubuntu 20.04.4 LTS"

lscpu
  Architecture:                    aarch64
  CPU op-mode(s):                  32-bit, 64-bit
  Byte Order:                      Little Endian
  CPU(s):                          80
  On-line CPU(s) list:             0-79
  Thread(s) per core:              1
  Core(s) per socket:              80
  Socket(s):                       1
  NUMA node(s):                    1
  Vendor ID:                       ARM
  Model:                           1
  Model name:                      Neoverse-N1
  Stepping:                        r3p1
  Frequency boost:                 disabled
  CPU max MHz:                     3000.0000
  CPU min MHz:                     1000.0000
  BogoMIPS:                        50.00
  L1d cache:                       5 MiB
  L1i cache:                       5 MiB
  L2 cache:                        80 MiB
  NUMA node0 CPU(s):               0-79
  Vulnerability Itlb multihit:     Not affected
  Vulnerability L1tf:              Not affected
  Vulnerability Mds:               Not affected
  Vulnerability Meltdown:          Not affected
  Vulnerability Mmio stale data:   Not affected
  Vulnerability Retbleed:          Not affected
  Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
  Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
  Vulnerability Spectre v2:        Mitigation; CSV2, BHB
  Vulnerability Srbds:             Not affected
  Vulnerability Tsx async abort:   Not affected
  Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

free -m
                total        used        free      shared  buff/cache   available
  Mem:         514318        5282       73564           6      435471      504537
  Swap:          2047         213        1834

gmarkall commented 1 year ago

I can confirm being able to reproduce a similar issue on a non-M1 AArch64 - in general we can overflow relocations - the assertion is a little different because Linux on AArch64 us using RuntimeDyldELF and not RuntimDyldMachO, but I think the principle (and the root cause) is the same, but I need to investigate further to be sure. At present I'm reproducing with DALI like:

/opt/dali/dali/test/python# DALI_EXTRA_PATH=/opt/dali_extra python -m nose2 --verbose --plugin=nose2_test_timer.plugin --with-timer --timer-color --timer-top-n 20 -A '!slow' -s operator_1 test_numba_func
test_numba_func.test_multiple_ins ... ok
test_numba_func.test_split_images_col ... ok
test_numba_func.test_numba_func:1
[(10, 10, 10)], <class 'numpy.uint8'>, <function set_all_values_to_255_batch at ... ok
test_numba_func.test_numba_func:2
[(10, 10, 10)], <class 'numpy.uint8'>, <function set_all_values_to_255_sample a ... python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

and I need to figure out how to make a Numba-only reproducer.

I'm working on a system very similar to the one reported by @mzient in https://github.com/numba/numba/issues/8567#issuecomment-1556803212 - just some small minor OS / kernel version differences.

gmarkall commented 1 year ago

I couldn't trigger this issue with @sklam's script from https://github.com/numba/numba/issues/8567#issue-1432286236, even after hundreds of runs on a Linux AArch64 system. However, the following (still using DALI, but without needing a test harness) does reproduce the issue pretty reliably:

test_standalone.py

```python import numpy as np from nvidia.dali import pipeline_def import nvidia.dali as dali import nvidia.dali.fn as fn import nvidia.dali.types as dali_types from nvidia.dali.plugin.numba.fn.experimental import numba_function def set_all_values_to_255_batch(out0, in0): out0[0][:] = 255 def set_all_values_to_255_sample(out0, in0): out0[:] = 255 def set_all_values_to_float_batch(out0, in0): out0[0][:] = 0.5 def set_all_values_to_float_sample(out0, in0): out0[:] = 0.5 def setup_change_out_shape(out_shape, in_shape): out0_shape = out_shape[0] in0_shape = in_shape[0] perm = [1, 2, 0] for sample_idx in range(len(out0_shape)): for d in range(len(perm)): out0_shape[sample_idx][d] = in0_shape[sample_idx][perm[d]] def change_out_shape_batch(out0, in0): for sample_id in range(len(out0)): out0[sample_id][:] = 42 def change_out_shape_sample(out0, in0): out0[:] = 42 def get_data(shapes, dtype): return [np.empty(shape, dtype=dtype) for shape in shapes] def get_data_zeros(shapes, dtype): return [np.zeros(shape, dtype=dtype) for shape in shapes] @pipeline_def def numba_func_pipe(shapes, dtype, run_fn=None, out_types=None, in_types=None, outs_ndim=None, ins_ndim=None, setup_fn=None, batch_processing=None): data = fn.external_source(lambda: get_data(shapes, dtype), batch=True, device="cpu") return numba_function( data, run_fn=run_fn, out_types=out_types, in_types=in_types, outs_ndim=outs_ndim, ins_ndim=ins_ndim, setup_fn=setup_fn, batch_processing=batch_processing) def _testimpl_numba_func(shapes, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, setup_fn, batch_processing, expected_out): batch_size = len(shapes) pipe = numba_func_pipe( batch_size=batch_size, num_threads=1, device_id=0, shapes=shapes, dtype=dtype, run_fn=run_fn, setup_fn=setup_fn, out_types=out_types, in_types=in_types, outs_ndim=outs_ndim, ins_ndim=ins_ndim, batch_processing=batch_processing) pipe.build() for _ in range(3): outs = pipe.run() for i in range(batch_size): out_arr = np.array(outs[0][i]) assert np.array_equal(out_arr, expected_out[i]) def test_numba_func(): # shape, dtype, run_fn, out_types, # in_types, out_ndim, in_ndim, setup_fn, batch_processing, # expected_out args = [ ([(10, 10, 10)], np.uint8, set_all_values_to_255_batch, [dali_types.UINT8], [dali_types.UINT8], [3], [3], None, True, [np.full((10, 10, 10), 255, dtype=np.uint8)]), ([(10, 10, 10)], np.uint8, set_all_values_to_255_sample, [dali_types.UINT8], [dali_types.UINT8], [3], [3], None, None, [np.full((10, 10, 10), 255, dtype=np.uint8)]), ([(10, 10, 10)], np.float32, set_all_values_to_float_batch, [dali_types.FLOAT], [dali_types.FLOAT], [3], [3], None, True, [np.full((10, 10, 10), 0.5, dtype=np.float32)]), ([(10, 10, 10)], np.float32, set_all_values_to_float_sample, [dali_types.FLOAT], [dali_types.FLOAT], [3], [3], None, None, [np.full((10, 10, 10), 0.5, dtype=np.float32)]), ([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_batch, [dali_types.INT64], [dali_types.INT64], [3], [3], setup_change_out_shape, True, [np.full((20, 30, 10), 42, dtype=np.int32), np.full((10, 30, 20), 42, dtype=np.int32)]), ([(10, 20, 30), (20, 10, 30)], np.int64, change_out_shape_sample, [dali_types.INT64], [dali_types.INT64], [3], [3], setup_change_out_shape, None, [np.full((20, 30, 10), 42, dtype=np.int32), np.full((10, 30, 20), 42, dtype=np.int32)]), ] for shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \ setup_fn, batch_processing, expected_out in args: _testimpl_numba_func( shape, dtype, run_fn, out_types, in_types, outs_ndim, ins_ndim, \ setup_fn, batch_processing, expected_out ) test_numba_func() ```

which gives this on almost every run:

$ python test_standalone.py 
python: /root/miniconda3/envs/buildenv/conda-bld/llvmdev_1680642098205/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

gmarkall commented 1 year ago

I've broken the Linux-specific variant of this issue into #9001 to avoid spamming everyone here as I post updates whilst I debug. Please watch / subscribe to that if you want to track as I'm working on the issue on Linux AArch64.

gmarkall commented 1 year ago

I'm finding this surprisingly hard to reproduce on macOS. My environment is:

numba -s output

``` $ numba -s System info: -------------------------------------------------------------------------------- __Time Stamp__ Report started (local time) : 2023-06-07 22:38:43.250968 UTC start time : 2023-06-07 21:38:43.250976 Running time (s) : 0.822634 __Hardware Information__ Machine : arm64 CPU Name : cyclone CPU Count : 12 Number of accessible CPUs : ? List of accessible CPUs cores : ? CFS Restrictions (CPUs worth of runtime) : None CPU Features : Memory Total (MB) : 32768 Free Memory (MB) : 1162 __OS Information__ Platform Name : macOS-13.3.1-arm64-arm-64bit Platform Release : 22.4.0 OS Name : Darwin OS Version : Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 OS Specific Version : 13.3.1 arm64 Libc Version : ? __Python Information__ Python Compiler : Clang 14.0.6 Python Implementation : CPython Python Version : 3.10.8 Python Locale : en_GB.UTF-8 __Numba Toolchain Versions__ Numba Version : 0.58.0dev0+190.gc9cc06ba1.dirty llvmlite Version : 0.41.0dev0-38-g70f057b __LLVM Information__ LLVM Version : 14.0.6 __CUDA Information__ CUDA Device Initialized : False CUDA Driver Version : ? CUDA Runtime Version : ? CUDA NVIDIA Bindings Available : ? CUDA NVIDIA Bindings In Use : ? CUDA Minor Version Compatibility Available : ? CUDA Minor Version Compatibility Needed : ? CUDA Minor Version Compatibility In Use : ? CUDA Detect Output: None CUDA Libraries Test Output: None __NumPy Information__ NumPy Version : 1.23.5 NumPy Supported SIMD features : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP') NumPy Supported SIMD dispatch : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM') NumPy Supported SIMD baseline : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD') NumPy AVX512_SKX support detected : False __SVML Information__ SVML State, config.USING_SVML : False SVML Library Loaded : False llvmlite Using SVML Patched LLVM : True SVML Operational : False __Threading Layer Information__ TBB Threading Layer Available : False +--> Disabled due to Unknown import problem. OpenMP Threading Layer Available : False +--> Disabled due to Unknown import problem. Workqueue Threading Layer Available : True +-->Workqueue imported successfully. __Numba Environment Variable Information__ NUMBA_FULL_TRACEBACKS : 1 NUMBA_DEVELOPER_MODE : 1 NUMBA_CAPTURED_ERRORS : new_style __Conda Information__ Conda Build : 3.23.3 Conda Env : 22.9.0 Conda Platform : osx-arm64 Conda Python Version : 3.10.8.final.0 Conda Root Writable : True __Installed Packages__ appnope 0.1.3 pyhd8ed1ab_0 conda-forge asttokens 2.2.1 pyhd8ed1ab_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 pyhd8ed1ab_3 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge brotli 1.0.9 h1a8c8d9_8 conda-forge brotli-bin 1.0.9 h1a8c8d9_8 conda-forge bzip2 1.0.8 h3422bc3_4 conda-forge c-ares 1.18.1 h3422bc3_0 conda-forge ca-certificates 2023.5.7 hf0a4a13_0 conda-forge certifi 2023.5.7 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py310h2399d43_3 conda-forge charset-normalizer 3.1.0 pyhd8ed1ab_0 conda-forge cmake 3.25.2 hf234bd0_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge executing 1.2.0 pyhd8ed1ab_0 conda-forge expat 2.5.0 hb7217d7_0 conda-forge flake8 6.0.0 pyhd8ed1ab_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge ipython 8.9.0 pyhd1c38e8_0 conda-forge jedi 0.18.2 pyhd8ed1ab_0 conda-forge jinja2 3.1.2 pyhd8ed1ab_1 conda-forge krb5 1.20.1 h69eda48_0 conda-forge libblas 3.9.0 16_osxarm64_openblas conda-forge libbrotlicommon 1.0.9 h1a8c8d9_8 conda-forge libbrotlidec 1.0.9 h1a8c8d9_8 conda-forge libbrotlienc 1.0.9 h1a8c8d9_8 conda-forge libcblas 3.9.0 16_osxarm64_openblas conda-forge libcurl 7.87.0 h9049daf_0 conda-forge libcxx 16.0.5 h4653b0c_0 conda-forge libedit 3.1.20191231 hc8eb9b7_2 conda-forge libev 4.33 h642e427_1 conda-forge libffi 3.4.2 h3422bc3_5 conda-forge libgfortran 5.0.0 12_2_0_hd922786_31 conda-forge libgfortran5 12.2.0 h0eea778_31 conda-forge liblapack 3.9.0 16_osxarm64_openblas conda-forge libnghttp2 1.51.0 hae82a92_0 conda-forge libopenblas 0.3.21 openmp_hc731615_3 conda-forge libsqlite 3.40.0 h76d750c_0 conda-forge libssh2 1.10.0 h7a5bd25_3 conda-forge libuv 1.44.2 he4db4b2_0 conda-forge libzlib 1.2.13 h03a7124_4 conda-forge llvm-openmp 15.0.7 h7cfbb63_0 conda-forge llvmdev 14.0.6 h70deae4_3 numba llvmlite 0.41.0.dev0+38.g70f057b dev_0 markupsafe 2.1.2 py310h8e9501a_0 conda-forge matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mccabe 0.7.0 pyhd8ed1ab_0 conda-forge ncurses 6.3 h07bb92c_1 conda-forge numba 0.58.0.dev0+190.gc9cc06ba1 dev_0 numpy 1.23.5 py310h5d7c261_0 conda-forge openssl 3.1.1 h53f4e23_1 conda-forge packaging 23.1 pyhd8ed1ab_0 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge pexpect 4.8.0 pyh1a96a4e_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 9.5.0 pypi_0 pypi pip 23.0 pyhd8ed1ab_0 conda-forge platformdirs 3.5.1 pyhd8ed1ab_0 conda-forge pooch 1.7.0 pyha770c72_3 conda-forge prompt-toolkit 3.0.36 pyha770c72_0 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pycodestyle 2.10.0 pyhd8ed1ab_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyflakes 3.0.1 pyhd8ed1ab_0 conda-forge pygments 2.14.0 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.10.8 h3ba56d0_0_cpython conda-forge python_abi 3.10 3_cp310 conda-forge readline 8.1.2 h46ed386_0 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge rhash 1.4.3 he4db4b2_0 conda-forge scipy 1.10.1 py310h0975f3d_3 conda-forge setuptools 66.1.1 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge stack_data 0.6.2 pyhd8ed1ab_0 conda-forge tk 8.6.12 he1e0b03_0 conda-forge traitlets 5.9.0 pyhd8ed1ab_0 conda-forge typing-extensions 4.6.3 hd8ed1ab_0 conda-forge typing_extensions 4.6.3 pyha770c72_0 conda-forge tzdata 2022g h191b570_0 conda-forge urllib3 2.0.3 pyhd8ed1ab_0 conda-forge wcwidth 0.2.6 pyhd8ed1ab_0 conda-forge wheel 0.38.4 pyhd8ed1ab_0 conda-forge xz 5.2.6 h57fd34a_0 conda-forge zlib 1.2.13 h03a7124_4 conda-forge zstd 1.5.2 hf913c23_6 conda-forge No errors reported. ```

@sklam Anything I might be missing here compared to the setup you used to reproduce the issue?

carstenr commented 1 year ago

This issue recently presented itself to me. Any suggestions where i might dig into when trying to contribute a fix? I see we are somewhere between llvm and llvm lite?

gmarkall commented 1 year ago

This issue recently presented itself to me. Any suggestions where i might dig into when trying to contribute a fix? I see we are somewhere between llvm and llvm lite?

https://github.com/numba/numba/issues/9001#issuecomment-1581424023 describes the issue - the GOT is allocated more than 4GB away from a text section it refers to. If you'd like to start digging in, I'd suggest looking into the RuntimeDyld allocator to devise a strategy that ensures this can't happen. I understand JITLink has a slab allocator already which can help elide this issue, but I didn't yet get chance to look into it further,

gmarkall commented 1 year ago

@carstenr I had a little more thought about this recently... One of the problems that makes it hard to think about a fix is that reproducing the issue is a giant pain at present - if you're able to do anything to take the existing reproducers and simplify them at all, that would help make it easier for someone (or yourself) to understand the issue and work on a fix.

Francyrad commented 1 year ago

@carstenr I had a little more thought about this recently... One of the problems that makes it hard to think about a fix is that reproducing the issue is a giant pain at present - if you're able to do anything to take the existing reproducers and simplify them at all, that would help make it easier for someone (or yourself) to understand the issue and work on a fix.

I can give you a script that is able to reproduce it quite often if that can help

gmarkall commented 1 year ago

I can give you a script that is able to reproduce it quite often if that can help

Yes please!

Francyrad commented 1 year ago

I can give you a script that is able to reproduce it quite often if that can help

Yes please!

please write me in francyrad.info@gmail.com

The script and the file that you will read is quite big

carstenr commented 1 year ago

Alright, that means we got two large cases then to reporoduce. We will focus on reducing is as much as possible.

gmarkall commented 1 year ago

Alright, that means we got two large cases then to reporoduce. We will focus on reducing is as much as possible.

Another thought I think worth sharing - it should be possible to get to a reproducer that doesn't depend on Numba at all - if it's minimised as much as possible, it would just involve calls to llvmlite. (Or even simpler than that, a small C++ source that links to LLVM only, to even take llvmlite out of the loop - but I think the "just llvmlite" case would already be a good starting point)

carstenr commented 1 year ago

Might take a while to get there as our developers naturally have a strong python background. We will start with a minimal nixtla setup, which is where this popped up for us. And from there on we will work our way down.

PhilipVinc commented 1 year ago

Bump.

I am consistently seeing this on M1 Pro and M2. It's a bit involved, but it occurs with ~30% probability in my code.

Are you still looking for a reproducer @gmarkall ?

PhilipVinc commented 1 year ago

FYI by googling I noticed that when porting Julia to ARM they also hit the same bug. Look at https://github.com/JuliaLang/julia/issues/36617 and search in the page for "Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."),".

Apparently, if this can help at all, the PR that fixed the issue was https://github.com/JuliaLang/julia/pull/43664 ...

Francyrad commented 1 year ago

FYI by googling I noticed that when porting Julia to ARM they also hit the same bug. Look at JuliaLang/julia#36617 and search in the page for "Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."),".

Apparently, if this can help at all, the PR that fixed the issue was JuliaLang/julia#43664 ...

The problem is still present

gmarkall commented 1 year ago

Are you still looking for a reproducer @gmarkall ?

Luckily, and coincidentally, I was working on this today, and I now have a pretty good one, which I'm going to add to #9001 because I'm tackling the issue on Linux AArch64 at present.

In case you want to try it, it's:

from numba import njit

@njit
def f(x, y):
    return x + y

i = 0

while True:
    print(i)
    t = tuple(range(i))
    f(t, (1j,))
    i += 1

executed with:

$ ulimit -s 1048576
$ python repro.py

gives:

0
1
2
3
4
5
6
7
8
9
python: /opt/conda/conda-bld/llvmdev_1684517249134/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:507: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.
Aborted (core dumped)

It'd be interesting to know if that also triggers the error on your Mac. You might need to do something similar to my ulimit invocation above to increase the stack limit.

PhilipVinc commented 1 year ago

I can't set the ulimit to such large numbers on Mac. it errors with

ulimit: value exceeds hard limit

The largest ulimit I can set is ulimit -s 65520

but it is not crashing for now...

gmarkall commented 1 year ago

What number did it get to before you stopped it?

PhilipVinc commented 1 year ago

I can let it run the whole night if you tell me it can be useful for you. Il 19 ott 2023, 6:47 PM +0200, Graham Markall @.***>, ha scritto:

What number did it get to before you stopped it? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

gmarkall commented 1 year ago

That would be great if you could give it a go!

gmarkall commented 1 year ago

@PhilipVinc Is it still running? :-)

PhilipVinc commented 1 year ago

@gmarkall it crashes at 1001 but I think this is due to some check in numba itself?


999
1000
1001
Traceback (most recent call last):
  File "/Users/filippo.vicentini/Dropbox/Ricerca/Codes/Python/netket/repro.py", line 12, in <module>
  File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 471, in _compile_for_args
    error_rewrite(e, 'unsupported_error')
  File "/Users/filippo.vicentini/Documents/pythonenvs/netket/python-3.11.2/lib/python3.11/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.UnsupportedError: Failed in nopython mode pipeline (step: ensure features that are in use are in a valid form)
Tuple 'x' length must be smaller than 1000.
Large tuples lead to the generation of a prohibitively large LLVM IR which causes excessive memory pressure and large compile times.
As an alternative, the use of a 'list' is recommended in place of a 'tuple' as lists do not suffer from this problem.

File "repro.py", line 3:
<source missing, REPL/exec in use?>

EDIT: This is with ulimit -s 65520

gmarkall commented 1 year ago

@PhilipVinc Thanks - indeed, that was a Numba limitation. I think in #9001 and https://github.com/gmarkall/numba-issue-9001 we're getting close to a really good reproducer now, so there's probably no need for additional testing here - thanks for everything you've looked into so far :-)

gmarkall commented 1 year ago

LLVM discourse discussion started to discuss a potential fix: https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616

gmarkall commented 12 months ago

@Francyrad @PhilipVinc @carstenr It's early work at the moment, but if you're able to build llvmlite from source with the PR https://github.com/numba/llvmlite/pull/1009, and let me know whether you still observe the issue with it (or observe any other issues) that would be good feedback - hopefully this resolves the issue, but there's a lot of testing / review to be done to have confidence in the strategy.

jacobjivanov commented 10 months ago

I have experienced this issue repeatedly over the past month, getting errors similar to the following for my ~150 line code for solving a specific PDE:

Assertion failed: (isInt<33>(Addend) && "Invalid page reloc value."), function encodeAddend, file /Users/ci/miniconda3-arm64/conda-bld/llvmdev_1643905487494/work/lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldMachOAArch64.h, line 210.

@gmarkall, I'm not quite sure how to build from source but am happy to try and test it out.

gmarkall commented 10 months ago

@jacobjivanov Thanks for sharing this info - fortunately you don't need to build from source to test the fix now, as it's part of the llvmlite 0.42 / Numba 0.59 release candidates. You can follow the instructions here to install the Numba and llvmlite release candidates: https://numba.discourse.group/t/ann-numba-0-59-0rc1-and-llvmlite-0-42-0rc1/2329

If you try this, I'd really appreciate if you can let me know whether it appears to have solved the issue for you.

jacobjivanov commented 10 months ago

@gmarkall, I can't confirm whether it'll ever fail, but it no longer fails for the particular script that would fail roughly 50% of the time previously. Ran it ~20 times with different initial conditions.

GeorgWa commented 10 months ago

@gmarkall Your work is greatly appreciated! Switching to the release candidate also solved the issue for one of our packages which would occasionally fail.

esc commented 9 months ago

With llvmlite now at 0.42.0 and the new memory manager merged, can we close this?

gmarkall commented 9 months ago

I've not heard of any reports of this issue manifesting in llvmlite 0.42, so I think so.

esc commented 9 months ago

I've not heard of any reports of this issue manifesting in llvmlite 0.42, so I think so.

Alright, let's put a proverbial checkmark behind this issue. We always have the option to re-open in case.

@gmarkall thank you again for the fix for this, it is much appreciated!

numba / numba

M1 LLVM Runtimedyld Invalid page reloc value assertion error #8567

8583 only disables the tests so that we can complete the test-suite and ship the package, so it won't actually help with the issue.