linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.31k stars 1.01k forks source link

clone04 segfault in Alpine container #1081

Closed richiejp closed 1 year ago

richiejp commented 1 year ago

clone04 segfaults when built with Alpine 3.18. Most likely musl related issue.

# clone04
tst_test.c:1684: TINFO: LTP version: 20230516
tst_test.c:1568: TINFO: Timeout per run is 0h 00m 30s
tst_test.c:1628: TBROK: Test killed by SIGSEGV!

Summary:
passed   0
failed   0
broken   1
skipped  0
warnings 0
richiejp commented 1 year ago

container: https://hub.docker.com/repository/docker/richiejp/ltp/general Containerfile: https://github.com/richiejp/ltp/blob/alpine-container/Containerfile

metan-ucw commented 1 year ago

That is really strange, what the test does it to pass NULL stack to sys_clone3 and expects error to be returned from kernel. If return from syscall(sys_clone3) fails after kernel errors out this may be a MUSL bug.

richiejp commented 1 year ago

T

That is really strange, what the test does it to pass NULL stack to sys_clone3 and expects error to be returned from kernel. If return from syscall(sys_clone3) fails after kernel errors out this may be a MUSL bug.

This reminds me that previously musl (or another libc except glibc) has segfaulted when deliberately given bad pointers intended for the kernel. I don't think it is considered a bug by musl or uclibc etc.

Martchus commented 1 year ago

The problem is reproducible but maybe it is really not considered a bug.

According to https://man7.org/linux/man-pages/man2/clone.2.html the behavior is well defined when stack is null but it seems like this documentation is glibc specific. MUSL doesn't seem to do a check for null, see https://github.com/esmil/musl/blob/master/src/linux/clone.c (in contrast do glibc, see https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L60). So maybe we should disable this test unless glibc is used?

Martchus commented 1 year ago

By the way, I have also checked whether helpers on the ltp-side take null pointers into account. I think there's no problem on the ltp side in that regard. (The only place where the stack pointer is not just passed but modified takes the null-case into account.)

metan-ucw commented 1 year ago

Ah looks like I managed to misread the source, the test uses the old ltp_clone() which may end up calling libc clone().

Also it looks like this was fixed recently in MUSL, see https://git.musl-libc.org/cgit/musl/commit/src/linux/clone.c?id=fa4a8abd06a401822cc8ba4e352a219544c0118d

However I do not see how the old version could have been broken, all it does is to pack the parameters and jump to assembly syscall wrapper.

Martchus commented 1 year ago

which may end up calling libc clone()

I looked into that and it definitely does.

Also it looks like this was fixed recently in MUSL

Right, the link to GitHub mentioned before seems outdated.

However I do not see how the old version could have been broken, all it does is to pack the parameters and jump to assembly syscall wrapper.

I would have assumed that neither the syscall wrapper nor the handling of the syscall itself within the kernel do additional checks for null.

metan-ucw commented 1 year ago

@Martchus looks like there are checks for stack pointer and size in the new clone3 syscall, but not in the old clone, that one just seems to pack the parameters into a structure and call copy_process(), so I suppose that kernel actually starts a process with invalid stack that just crashes afterwards.

richiejp commented 1 year ago

The segfault happens in the clone assembly wrapper:

(gdb) set follow-fork-mode child
(gdb) run
Starting program: /opt/ltp/testcases/bin/clone04
tst_test.c:1684: TINFO: LTP version: 20230516
tst_test.c:1568: TINFO: Timeout per run is 0h 00m 30s
[Attaching after process 15296 fork to child process 15297]
[New inferior 2 (process 15297)]
[Detaching after fork from parent process 15296]
[Inferior 1 (process 15296) detached]

Thread 2.1 "clone04" received signal SIGSEGV, Segmentation fault.
[Switching to process 15297]
__clone () at src/thread/x86_64/clone.s:16
16      src/thread/x86_64/clone.s: No such file or directory.
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) tst_test.c:1628: TBROK: Test killed by SIGSEGV!

Note that you can install debug symbols and gdb with apk add gdb musl-dbg.

metan-ucw commented 1 year ago

@richiejp hmm, before or after the sycall call?

metan-ucw commented 1 year ago

Hmm, looks like line 16 is just before the sycall mov %rcx,(%rsi) and if I'm not mistaken rsi is second function parameter which is the stack pointer so we apparently attempt put something on the stack, no wonder it segfaults.

richiejp commented 1 year ago

Yup, definitely before the syscall according to strace.

pevik commented 1 year ago

Also it looks like this was fixed recently in MUSL, see https://git.musl-libc.org/cgit/musl/commit/src/linux/clone.c?id=fa4a8abd06a401822cc8ba4e352a219544c0118d

Cyril found that the glibc code also puts something (the argument) on the stack: https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L63

We agreed with Cyril it's a bug. Thus instead of rewriting the test (alternative solution from Cyril: to run the test in child, wait in the parent and convert SIGSEGV to TPASS) it'll be better to point users to the fix (fa4a8abd ("fix public clone function to be safe and usable by applications"). I'll post tonight the patch which adds musl-git support to C API and point this commit in tags.

Martchus commented 1 year ago

Cyril found that the glibc code also puts something (the argument) on the stack: https://github.com/bminor/glibc/blob/5d00c201b9a2da768a79ea8d5311f257871c0b43/sysdeps/unix/sysv/linux/x86_64/clone.S#L63

That was actually my message via IRC :-)

richiejp commented 1 year ago

OK, fixed, thanks.

pevik commented 1 year ago

Not yet, patch has been sent, but not yet merged: https://patchwork.ozlabs.org/project/ltp/list/?series=374767&state=* https://patchwork.ozlabs.org/project/ltp/patch/20230925152450.89228-1-pvorel@suse.cz/