Closed q53 closed 2 months ago
launching with --platform=systrap end up with "panic: seccomp failed: invalid argument" in debug logs (on all hosts), but looks like it is another bug
Yes. Can you file another bug with more details about this Systrap seccomp issue? Turning on debug logging should show the seccomp-bpf program being sent to the kernel prior to getting EINVAL. I believe it may be due to your 4.18.0 kernel (that's quite old) not supporting SECCOMP_IOCTL_NOTIF_*
but I thought some fallback code for this was added for older kernels.
Can you file another bug with more details about this Systrap seccomp issue?
I think it is about the FSGSBASE instructions. The kernel is too old and doesn't support them: https://www.kernel.org/doc/html/v5.9/x86/x86_64/fsgs.html#accessing-fs-gs-base-with-the-fsgsbase-instructions
@q53 Could you try to reproduce the issue with this patch:
diff --git a/pkg/ring0/lib_amd64.go b/pkg/ring0/lib_amd64.go
index fe69b6988..d42a587f4 100644
--- a/pkg/ring0/lib_amd64.go
+++ b/pkg/ring0/lib_amd64.go
@@ -117,6 +117,7 @@ func Init(fs cpuid.FeatureSet) {
hasXSAVEOPT = fs.UseXsaveopt()
hasXSAVE = fs.UseXsave()
hasFSGSBASE = fs.HasFeature(cpuid.X86FeatureFSGSBase)
+ hasFSGSBASE = false
validXCR0Mask = uintptr(fs.ValidXCR0Mask())
if hasXSAVE {
XCR0DisabledMask := uintptr((1 << 9) | (1 << 17) | (1 << 18))
@q53 Could you try to reproduce the issue with this patch:
It does not work.
I think it is about the FSGSBASE instructions. The kernel is too old and doesn't support them: https://www.kernel.org/doc/html/v5.9/x86/x86_64/fsgs.html#accessing-fs-gs-base-with-the-fsgsbase-instructions
On the other host with same kernel version but AMD processor it works fine, so I do not believe it is a kernel version issue.
Oops. I haven't read the description to the end and decided that runsc failed with "Illegal instruction". Actually, it is the app inside gvsior failed with this error. We need to find out what instruction triggers the signal. Could you reproduce the issue with the next patch and attach the runsc debug log:
diff --git a/pkg/sentry/kernel/task_signals.go b/pkg/sentry/kernel/task_signals.go
index 22d6bcddf..d8112b70e 100644
--- a/pkg/sentry/kernel/task_signals.go
+++ b/pkg/sentry/kernel/task_signals.go
@@ -202,6 +202,7 @@ func (t *Task) deliverSignal(info *linux.SignalInfo, act linux.SigAction) taskRu
}
t.Debugf("Signal %d, PID: %d, TID: %d, fault addr: %#x: terminating thread group", info.Signo, ucs.Pid, ucs.Tid, ucs.FaultAddr)
+ t.DebugDumpState()
eventchannel.Emit(ucs)
t.PrepareGroupExit(linux.WaitStatusTerminationSignal(sig))
@q53 The xgetbv instruction triggers a fault. According the output of lscpu, your cpu doesn't support it. The question is why the app is trying to use it. Could you show output of cat /proc/cpuinfo
from the gvisor container?
I think I figured out the root cause of this issue. Golang uses xgetbv, if cpuid reports OSXSAVE: https://github.com/golang/go/blob/959b3fd4265d7e4efb18af454cd18799ed70b8fe/src/internal/cpu/cpu_x86.go#L122
The kvm platform always set OSXSAVE: https://github.com/google/gvisor/blob/e87ab0a3018d1e5a622ed5b0e13e413dd30a86d2/pkg/sentry/platform/kvm/kvm_amd64.go#L237
I think I figured out the root cause of this issue. Golang uses xgetbv, if cpuid reports OSXSAVE: https://github.com/golang/go/blob/959b3fd4265d7e4efb18af454cd18799ed70b8fe/src/internal/cpu/cpu_x86.go#L122
The kvm platform always set OSXSAVE:
Building with the commented line does not trigger the error.
Description
Reproduced on the one specific host with Intel CPU, other AMD host has no issues. Version release-20240624.0 is affected as well, and launching with --platform=systrap end up with "panic: seccomp failed: invalid argument" in debug logs (on all hosts), but looks like it is another bug.
Steps to reproduce
docker -D -l debug run -i --runtime runsc-kvm --rm --name=test docker.io/library/registry:latest time="2024-07-05T22:57:36Z" level=debug msg="[hijack] End of stdout"
runsc version
docker version (if using docker)
uname
4.18.0-425.3.1.el8.x86_64 #1 SMP Tue Nov 8 14:08:25 EST 2022 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)