Open ghost opened 3 years ago
We at IBM are concerned that there is a latent bug in RuntimeDyldELF for PPC64LE, but apparently there is no reproducer.
source: https://github.com/numba/numba/issues/4026#issuecomment-705771587
I can say with near 100% confidence that (beside this bug), no other latent bugs should be in the code (re this issue). The compiler-side code I reviewed (PPCISelLowering.cpp) is very clean, with the right amount of in-source documentation.
The work/reviews on e.g. https://reviews.llvm.org/D81126?id=274770 can be easily followed.
With a little bit effort for clarifying RuntimeDyldELF (only the code that directly affects ppc), all should be good. The clarity of RuntimeDyldELF should become similar to that of PPCISelLowering.cpp, even if the future direction will be JITLink
@abebeos Thanks for the patch(es) and for fixing this! As promised here I raised this issue at yesterday's development meeting. Outcome was that we're going to hopefully roll these fixes into the 0.54 release of Numba along with attempting an upgrade to LLVM 11.
@abebeos thank you again for submitting this! We appreciate your efforts to improve Numba. Regarding the timeline: the 0.54 release cycle will most likely begin in a few weeks from now (end of Jan/ beg. of Feb). At that point I can commit to reviewing and validating your patch. I should also note, ahead of time, that we may need this patch to be ported to LLVM 11.0 since we will be attempting an upgrade from LLVM 10 to LLVM 11 as part of the 0.54 release.
@abebeos thank you, I will take it into consideration.
Dear @abebeos,
Judging from your response of "Ouch!" I assume that you are disappointed about the proposed timeline for validating your proposed patches? If that is the case, I am sorry that the project schedule does to align with your expectations.
Also, thank you for pre-compiling those packages but unfortunately they will be of little use for me. We will want to build and test the Numba stack (llvmdev
-> llvmlite
-> numba
) across the supported versions of Python and Numpy on the corresponding supported architectures as part of our efforts in quality assurance.
Thank you in advance for your patience and understanding in these matters.
-esc
(spawned from #4026)
Summary
A missing
nop
after a branch-instruction results in RuntimeDyldELF overwriting an instruction with the TOC restoration code. This happnes only in reloc-mode=static.The workaround in numba was to change (for ppc)
reloc_mode
back topic
(https://github.com/numba/numba/commit/31ddd81d559c4aa79fb25e4c7b45b44451df95d5)See bug-identification by
obilaniu
within https://github.com/numba/numba/issues/4026#issuecomment-487296000Reproduction
Suggested Fix
The fix was identified during the work-in to the compiler-side of the problem. Whilst looking for similar work on the relevant (pointed out by
nemanjai
within https://github.com/numba/numba/issues/4026#issuecomment-657941602)PPCISelLowering.cpp / callsShareTOCBase()
, the comments on the following commit indicated its relevance for the problem:https://reviews.llvm.org/rG2812c1515627904e31605bbd4f25a887a1f8eb12
The backported (llvm-10.x) patch (incl. adopted tests):
Binaries
Patched llvm-10 / llvmlite-0.35.0 available (temporarily, for testing):
Validation
(numba repo does not automatically test on PowerPC, see https://github.com/numba/llvmlite/pull/684#issuecomment-753946728)
I've validated the fix manually on Power8 hardware, see https://github.com/numba/llvmlite/pull/684#issuecomment-755295463.
The validation should be trivial for any dev with a ready dev-system.
Validation with pre-compiled binaries (After doing the Reproduction steps above):