Closed jeromegn closed 3 years ago
@jeromegn This is most definitely a mimalloc issue. Could you provide a few more details, such as your OS, build environment, and what code is causing this (or does it happen randomly in a huge app)? Also, does it happen once, always, multiple random times, etc?
This is probably better worth posting on the mimalloc issue board.
This is on Ubuntu 18.04, kernel 5.0.0 64-bit, release mode with debug symbols. Built in a docker container using the rust 1.44.0 official docker image.
This is a large app and it appears to happen randomly :/
I can post this in the mimalloc repo's issues. I was initially wondering if maybe the segfault was caused by a misuse of mimalloc in this crate :)
The "pinned" commit of mimalloc in mimalloc-sys is a bit old. I wonder if this might have been fixed since? For instance, this issue about a segfault (doesn't seem related) has been fixed ~Apr 6: https://github.com/microsoft/mimalloc/issues/221
@jeromegn Just pulled the latest changes from the mimalloc upstream on master. Could you try testing with the latest version in the master branch?
Thanks! I've rolled it out and will be monitoring for segfaults.
Segfaults started happening again even with the latest version. Updating the issue in msft's repo.
In the stack trace, std::sys::unix::stack_overflow::imp::signal_handler()
suggest you are hitting a stack overflow before the mimalloc issue.
// Signal handler for the SIGSEGV and SIGBUS handlers. We've got guard pages
// (unmapped pages) at the end of every thread's stack, so if a thread ends
// up running into the guard page it'll trigger this handler. We want to
// detect these cases and print out a helpful error saying that the stack
// has overflowed. All other signals, however, should go back to what they
// were originally supposed to do.
//
// This handler currently exists purely to print an informative message
// whenever a thread overflows its stack. We then abort to exit and
// indicate a crash, but to avoid a misleading SIGSEGV that might lead
// users to believe that unsafe code has accessed an invalid pointer; the
// SIGSEGV encountered when overflowing the stack is expected and
// well-defined.
//
// If this is not a stack overflow, the handler un-registers itself and
// then returns (to allow the original signal to be delivered again).
// Returning from this kind of signal handler is technically not defined
// to work when reading the POSIX spec strictly, but in practice it turns
// out many large systems and all implementations allow returning from a
// signal handler to work. For a more detailed explanation see the
// comments on #26458.
unsafe extern "C" fn signal_handler(
Thanks @Speedy37. if I understand correctly, I think that's consistent with the resolution of https://github.com/microsoft/mimalloc/issues/257 ?
It probably is. It would be great to verify if mi_malloc secure guard page os limitation triggers that signal_handler.
FWIW, I'm still using the dev
branch for this fix. I think they finally merged the fix upstream? Maybe mimalloc could be updated in the main branch in this repo?
Not a very descriptive title, but I'm not sure what's causing the segfault.
(This might be a mimalloc issue)
Here's a stacktrace, using mimalloc 0.1.19:
I'm not sure when it's happening either.
I can probably find more information if you point me at what you'd need :)