Poor performance when switching to multiple CPU Cores

Gal-Lahat commented 3 months ago

Description

I’m experiencing significant performance degradation when using GVisor with more than one CPU core. When running a container with a single CPU, everything works as expected. However, when I allocate two or more CPU cores, the container becomes extremely slow and idles at around 20% CPU usage, rendering it nearly unusable.

This issue occurs consistently across all containers I’ve tested, including completely empty containers, which exhibit the same performance degradation. Interestingly, even if the containers are only running a single thread, it seems like all cores of the CPU, when using multiple cores (e.g., 4 cores), experience high load, contributing to the overall slowdown.

Steps to reproduce

1.  Run any container with GVisor using a single CPU.
•   Expected Result: The container performs normally with low CPU usage.
2.  Run the same container with two or more CPU cores.
•   Actual Result: The container becomes very slow, with high idle CPU usage (~20%) and poor performance.

runsc version

runsc version release-20240807.0
spec: 1.1.0-rc.1

docker version (if using docker)

Server: Docker Engine - Community
 Engine:
  Version:          27.1.2
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.13
  Git commit:       f9522e5
  Built:            Mon Aug 12 11:51:03 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.20
  GitCommit:        8fc6bcff51318944179630522a095cc9dbf9f353
 runsc:
  Version:          release-20240807.0
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Gal-Lahat commented 3 months ago

Here are some High-Frequency Syscalls on the host (running an idle express js app with 0 requests) (about 30% cpu on runc):

1.  High-Frequency Syscalls:
•   sys_enter_write: 6 million calls, which suggests a lot of data writing operations are happening.
•   sys_enter_futex: 30,000 calls, indicating heavy use of synchronization primitives like mutexes.
•   sys_enter_nanosleep: 4,000 calls, which could imply the program is frequently sleeping for short periods.

ayushr2 commented 3 months ago

Could you share a reproducer workload? (Like a Dockerfile or something) And what environment are you using? What CPU? What Linux version? What runsc platform (if you are not explicitly setting --platform flag, then you must be using systrap platform)?

Gal-Lahat commented 3 months ago

Most of my tests are in a Docker Compose environment running a service that is built using a Dockerfile. The runc runtime is set as the default on the Docker daemon, and all of this runs on a VPS hosted on Contabo. I am running docker-compose up --build as a simple way to execute it.

I’m using Ubuntu 20.04. Here’s a summary of the CPU information from the VPS:

•   CPU MHz: 2496.248
•   Hypervisor vendor: KVM
•   Virtualization type: full
•   Caches: L1 - 256 KiB, L2 - 2 MiB, L3 - 16 MiB

I haven’t explicitly set the --platform flag, so I assume it’s using the sysstrap platform. Below is an example of one of the services defined in my Docker Compose setup:

2pls5ib68:
    build:
      context: ./2pls5ib68
      dockerfile: Dockerfile
    ports:
      - 3011:80
    networks:
      - 2pls5ib68
    restart: always
    logging:
      driver: local
      options:
        max-size: 2m
        max-file: "3"
    deploy:
      resources:
        limits:
          memory: 6G
    volumes:
      - /loop-devices-mount/2pls5ib68:/app

If you need anything else, like more configuration details or further clarification, please let me know.

Gal-Lahat commented 3 months ago

I actually experimented with the old legacy platform ptrace, and it seemed to resolve some of the performance issues I was facing. Specifically, the idle CPU usage dropped significantly, from around 30% to 0.5%. This is a noticeable improvement, although the overall performance isn’t yet optimal. I still need to conduct more tests under heavy CPU workloads to determine whether the improvement is limited to idle performance or if it positively affects performance across the board. Based on this, it seems like there might be an issue with the new systrap platform that needs further investigation.

ayushr2 commented 3 months ago

@avagin @konstantin-s-bogom for systrap. Yeah if switching to ptrace improves things, then likely an issue with systrap.

What is the application doing though? Like what CPU-intensive workload are you using? So we can reproduce.

EtiennePerot commented 3 months ago

+1 to a reproducer workload; multi-core applications use multiple cores in different ways and Systrap tries to do some heuristics to work well with most of them. So being able to reproduce what this specific application is doing is necessary in order to understand this problem.

I'd also note that Contabo is notorious for highly oversubscribing its machines, and having unreliable and inconsistent performance over time. I've experienced this first-hand; with disk I/O bandwidth I'd get 10x performance difference on some days vs others. You can look up reviews for Contabo online and that's usually the first thing they'll mention. The other thing they'll mention is the low price, not coincidentally.

So I suggest reproducing this on your local machine or on some other dedicated hardware. I'm not putting the blame on Contabo; it's quite likely that there is something suboptiomal about the way Systrap uses multiple cores for this particular workload, as it has had this type of problem in the past (see issue #9119). All I'm saying is that Contabo is not a reliable environment to get performance measurements from.

EtiennePerot commented 2 months ago

Another thing you may want to try is to build runsc after changing the following line:

https://github.com/google/gvisor/blob/043ce9c5d2b0ee7d5d476c5c8475cd44b724027a/pkg/sentry/platform/systrap/systrap.go#L335

to:

        neverEnableFastPath = true

From the way the variable is named, this sounds like it would hurt performance, and in most cases it should. Setting this to true removes the "fast path" feature of Systrap, which involves using spare CPU cores to achieve faster syscall handling performance. But in the case of a very busy system, which I think may be the case here, it might hurt more than it helps. So try to see what happens when you disable fast path (by setting neverEnableFastPath to true).

senju-hashirama commented 1 day ago

Hey we experienced a similar issue while running a gRPC server.We noticed no matter how much the load increased the CPU utilization stayed below 20%.

The host system was configured with an Intel(R) Xeon(R) Silver 4114 CPU running at 2.20 GHz, 64 GB of RAM. The operating system used was Ubuntu 22.04.3 LTS. runsc: release-20240807.0 We were using the default systrap platform

google / gvisor