Open sfc-gh-jyin opened 1 month ago
After some investigation, I found one potential cause of this is that with Gvisor, Sentry implements its own runsc-memfd
backed memory, and maps application virtual address space to memfd offset with its own VMA. However, even after the initial page fault, the memory access (especially memory write operations) are slower.
This seems not be directly related to Gvisor, as after the initial page fault after mmap, the memory access should be consistent with native linux kernel. However, the issue is memfd
backed memory itself. For some reason, on c6gd
AWS instance family, memory write operation through memfd always tends to be slower compared to directly writing to memory by around 5%.
@sfc-gh-jyin I think I found the real root cause of this issue. gVisor never sets the SSBS bit in pstate. With the next patch, I get same results in gVisor and without it:
diff --git a/pkg/sentry/arch/arch_aarch64.go b/pkg/sentry/arch/arch_aarch64.go
index 04262f6c5..e4a1d0187 100644
--- a/pkg/sentry/arch/arch_aarch64.go
+++ b/pkg/sentry/arch/arch_aarch64.go
@@ -257,12 +257,14 @@ func (s *State) FullRestore() bool {
func New(arch Arch) *Context64 {
switch arch {
case ARM64:
- return &Context64{
+ c:= &Context64{
State{
fpState: fpu.NewState(),
},
- []fpu.State(nil),
+ []fpu.State{nil},
}
+ c.Regs.Pstate |= linux.PSR_SSBS_BIT
+ return c
}
panic(fmt.Sprintf("unknown architecture %v", arch))
}
diff --git a/pkg/sentry/arch/signal_arm64.go b/pkg/sentry/arch/signal_arm64.go
index 1118d6a7f..959d6068b 100644
--- a/pkg/sentry/arch/signal_arm64.go
+++ b/pkg/sentry/arch/signal_arm64.go
@@ -157,7 +157,7 @@ func (regs *Registers) validRegs() bool {
}
// Force PSR to a valid 64-bit EL0t
- regs.Pstate &= linux.PSR_N_BIT | linux.PSR_Z_BIT | linux.PSR_C_BIT | linux.PSR_V_BIT
+ regs.Pstate &= linux.PSR_N_BIT | linux.PSR_Z_BIT | linux.PSR_C_BIT | linux.PSR_V_BIT | linux.PSR_SSBS_BIT
return false
}
This isn't a proper fix. We need to figure out when SSBS should be set.
Thank you @avagin! I tried your patch and it did help! Can we get this fix merged in main? Also, do you know why this issue did not manifest at similar degree for c7gd instances?
@avagin I have another question... Based on my understanding, PSR_SSBS_BIT
enables mitigation to security vulnerabilities introduced by Speculative Execution. Can you share some information on why setting this flag would improve the performance on gVisor?
Further, adding to @sfc-gh-jyin's questions, is there a reason that c7 instances would not experience this slowdown? I believe c6 are Graviton2 (Neoverse N1) and c7 are Graviton3 (Neoverse V1).
@sfc-gh-jyin it isn't only about gvisor. When you run you test on LInux, this bit is set in pstate and this is why you see a better performance. If you care about security and want to be safe from ssb, you probably want to disable PR_SPEC_STORE_BYPASS that is effectively drops PSR_SSBS_BIT from pstate.
More info about the meaning of this bit can be found here: https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/SSBS--Speculative-Store-Bypass-Safe.
The last line in my previous comment says that the patch isn't a fix and it is just there for explaining what is going on. We need to figure out when we can/need to set this bit. It should not be set by default to protect against SSB.
@jaingaurav My guess is that they found another way to mitigate SSB in these cpu-s.
Description
Hello,
We are currently benchmarking the cpu performance of gvisor compared to normal docker, and found out that same Python program running in gvisor is consistently slower compared to running on native kernel, or even with docker.
Note that we are aware of overhead introduced by additional hook for syscalls, but we are testing the cpu performance, and our test script does not issue syscalls.
The largest difference we observed so far are running on AWS
c6gd.2xlarge
instance. However, when running same suite on c7 instance family, the performance of gvisor is close to native kernel. Thus we are wondering what might be the rootcause of this, and how can we configure gvisor to make it perform better.Test Environment: AWS
c6gd.2xlarge
instance with AL2 ami. Python. Python version: 3.7.16 Test script (Very simple pi calculation):Running on native kernel:
Running with docker container:
Running with runsc:
In all three cases, the process is consuming nearly 100% of cpu all time. However, when I use
perf
tool to check the stats, it shows that process started by gvisor runs with around 10~15% slower in terms oninstructions per cycle
:docker version (if using docker)
No response
uname
5.10.216-204.855.amzn2.aarch64 #1 SMP Sat May 4 16:53:24 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
No response