oracle / graal

GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
https://www.graalvm.org
Other
20.4k stars 1.64k forks source link

[GR-52023] Re-evaluate LibC.abort implementation #8352

Open SergejIsbrecht opened 9 months ago

SergejIsbrecht commented 9 months ago

Describe the issue

Currently LibC#abort calls exit(EXIT_CODE_ABORT) instead of sending signal 6. This is a problem, because PosixSubstrateSegfaultHandler calls it^0 after handling SEGV. Without sending SIGABRT, the handled signal is swallowed and the kernel does not trigger a core dump. For crash analysis core dumps are necessary, as well as the VM state printed by PosixSubstrateSegfaultHandler.

        /*
         * Using the abort system call has unexpected performance implications on Oracle Enterprise
         * Linux: Storing the crash dump information takes minutes even for tiny images. Therefore,
         * we just exit with an otherwise unused exit code.
         */

https://github.com/oracle/graal/blob/680090f27f71ec8cbf29b4eab3fe490a5559a9f1/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/headers/LibC.java#L102

I would like to to change the current LibC#abort implementation to call SIGABRT, instead of exiting with 99.

^0 https://github.com/oracle/graal/blob/680090f27f71ec8cbf29b4eab3fe490a5559a9f1/substratevm/src/com.oracle.svm.core.posix/src/com/oracle/svm/core/posix/PosixSubstrateSegfaultHandler.java#L66

fniephaus commented 9 months ago

Thanks for the ticket, @SergejIsbrecht. This design decision is relatively old and we are considering changing the behavior. Please allow some time for this though as it requires significantly more work on our end than just changing a single line :)

SergejIsbrecht commented 9 months ago

@fniephaus ,

thank you for looking into it. Maybe some works from my side. On aarch64 Linux core dumps are created very fast. We did not experience any of the described behavior.

it requires significantly more work

Could you maybe elaborate on it a little bit more? We changed it as a work-around for now. Maybe we overlooked something important?

fniephaus commented 2 months ago

Sorry for the very slow response...

Could you maybe elaborate on it a little bit more?

IIRC, we heavily rely on this behavior in our CI setup, and that needs to be addressed before we can move this forward. If you can work around this at the moment, that's good.