Closed netguino closed 1 month ago
A simple way to reproduce:
https://gist.github.com/netguino/f1b76a5256637379f37bbdddc4b74f45
Use these files, and run mirrord exec -- bundle exec rake test
It seems that the system call is the one causing the issue.
Looks like commenting out the DetourGuard
stops this issue from being triggered. Using hook_guard_fn
also breaks stuff (tested it for clarity of mind).
I'm not sure why this happens though, maybe it blows up the stack because we keep creating these guards and they stay alive forever (since execve
doesn't really return)?
#[mirrord_layer_macro::hook_fn]
pub(crate) unsafe extern "C" fn execve_detour(
path: *const c_char,
argv: *const *const c_char,
envp: *const *const c_char,
) -> c_int {
use crate::{common::CheckedInto, detour::DetourGuard};
// let _guard = DetourGuard::new();
// Hopefully `envp` is a properly null-terminated list.
if let Detour::Success(envp) = prepare_execve_envp(envp.checked_into()) {
FN_EXECVE(path, argv, envp.leak())
} else {
FN_EXECVE(path, argv, envp)
}
}
@Razz4780 any ideas?
from thread_local
/// # Platform-specific behavior
///
/// Note that a "best effort" is made to ensure that destructors for types
/// stored in thread local storage are run, but not all platforms can guarantee
/// that destructors will be run for all types in thread local storage. For
/// example, there are a number of known caveats where destructors are not run:
///
/// 1. On Unix systems when pthread-based TLS is being used, destructors will
/// not be run for TLS values on the main thread when it exits. Note that the
/// application will exit immediately after the main thread exits as well.
/// 2. On all platforms it's possible for TLS to re-initialize other TLS slots
/// during destruction. Some platforms ensure that this cannot happen
/// infinitely by preventing re-initialization of any slot that has been
/// destroyed, but not all platforms have this guard. Those platforms that do
/// not guard typically have a synthetic limit after which point no more
/// destructors are run.
/// 3. When the process exits on Windows systems, TLS destructors may only be
/// run on the thread that causes the process to exit. This is because the
/// other threads may be forcibly terminated.
I think exec might trigger calling destructors, which clean it up perhaps we need to leak the value before exec?
Tried to reproduce on:
Ran tests with:
execve
hook - intproxy-no-guard.logexecve
hook - intproxy-normal-guard.logstd::mem::forget
in execve
hook - intproxy-guard-forget.logLooks like there's no difference between cases 2. and 3. DNS resolution is correctly hooked in only one of the tests (you can
see one GetAddrInfoRequest
for py-serv
).
In case 1. the process hangs and stopping it with ctrl+c does not trigger intproxy exit (intproxy lingers until manually killed)
Bug Description
It seems that whenever we invoke mirrord exec, we are running out of stack, and mirrord is silently disappearing.
In this case, we are running ruby version 3.2.1 on linux.
Can confirm this doesnt happen on mac.
Can confirm 3.111.0 does not exhibit this issue
Steps to Reproduce
Backtrace
Relevant Logs
Your operating system and version
Linux 6.8.5 kernel
Local process
bundle
Local process version
ruby 3.21. bundler 2.4.6
Additional Info
Unfortunately no bypass on socket logs :