Closed richiejp closed 1 year ago
That is really strange, what the test does it to pass NULL stack to sys_clone3 and expects error to be returned from kernel. If return from syscall(sys_clone3) fails after kernel errors out this may be a MUSL bug.
T
That is really strange, what the test does it to pass NULL stack to sys_clone3 and expects error to be returned from kernel. If return from syscall(sys_clone3) fails after kernel errors out this may be a MUSL bug.
This reminds me that previously musl (or another libc except glibc) has segfaulted when deliberately given bad pointers intended for the kernel. I don't think it is considered a bug by musl or uclibc etc.
The problem is reproducible but maybe it is really not considered a bug.
According to https://man7.org/linux/man-pages/man2/clone.2.html the behavior is well defined when stack is null but it seems like this documentation is glibc specific. MUSL doesn't seem to do a check for null, see https://github.com/esmil/musl/blob/master/src/linux/clone.c (in contrast do glibc, see https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L60). So maybe we should disable this test unless glibc is used?
By the way, I have also checked whether helpers on the ltp-side take null pointers into account. I think there's no problem on the ltp side in that regard. (The only place where the stack pointer is not just passed but modified takes the null-case into account.)
Ah looks like I managed to misread the source, the test uses the old ltp_clone() which may end up calling libc clone().
Also it looks like this was fixed recently in MUSL, see https://git.musl-libc.org/cgit/musl/commit/src/linux/clone.c?id=fa4a8abd06a401822cc8ba4e352a219544c0118d
However I do not see how the old version could have been broken, all it does is to pack the parameters and jump to assembly syscall wrapper.
which may end up calling libc clone()
I looked into that and it definitely does.
Also it looks like this was fixed recently in MUSL
Right, the link to GitHub mentioned before seems outdated.
However I do not see how the old version could have been broken, all it does is to pack the parameters and jump to assembly syscall wrapper.
I would have assumed that neither the syscall wrapper nor the handling of the syscall itself within the kernel do additional checks for null.
@Martchus looks like there are checks for stack pointer and size in the new clone3 syscall, but not in the old clone, that one just seems to pack the parameters into a structure and call copy_process()
, so I suppose that kernel actually starts a process with invalid stack that just crashes afterwards.
The segfault happens in the clone assembly wrapper:
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /opt/ltp/testcases/bin/clone04
tst_test.c:1684: TINFO: LTP version: 20230516
tst_test.c:1568: TINFO: Timeout per run is 0h 00m 30s
[Attaching after process 15296 fork to child process 15297]
[New inferior 2 (process 15297)]
[Detaching after fork from parent process 15296]
[Inferior 1 (process 15296) detached]
Thread 2.1 "clone04" received signal SIGSEGV, Segmentation fault.
[Switching to process 15297]
__clone () at src/thread/x86_64/clone.s:16
16 src/thread/x86_64/clone.s: No such file or directory.
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) tst_test.c:1628: TBROK: Test killed by SIGSEGV!
Note that you can install debug symbols and gdb with apk add gdb musl-dbg
.
@richiejp hmm, before or after the sycall call?
Hmm, looks like line 16 is just before the sycall mov %rcx,(%rsi)
and if I'm not mistaken rsi
is second function parameter which is the stack pointer so we apparently attempt put something on the stack, no wonder it segfaults.
Yup, definitely before the syscall according to strace.
Also it looks like this was fixed recently in MUSL, see https://git.musl-libc.org/cgit/musl/commit/src/linux/clone.c?id=fa4a8abd06a401822cc8ba4e352a219544c0118d
Cyril found that the glibc code also puts something (the argument) on the stack: https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L63
We agreed with Cyril it's a bug. Thus instead of rewriting the test (alternative solution from Cyril: to run the test in child, wait in the parent and convert SIGSEGV to TPASS) it'll be better to point users to the fix (fa4a8abd ("fix public clone function to be safe and usable by applications"). I'll post tonight the patch which adds musl-git
support to C API and point this commit in tags.
Cyril found that the glibc code also puts something (the argument) on the stack: https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L63
That was actually my message via IRC :-)
OK, fixed, thanks.
Not yet, patch has been sent, but not yet merged: https://patchwork.ozlabs.org/project/ltp/list/?series=374767&state=* https://patchwork.ozlabs.org/project/ltp/patch/20230925152450.89228-1-pvorel@suse.cz/
clone04 segfaults when built with Alpine 3.18. Most likely musl related issue.