opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.79k stars 2.1k forks source link

Running container killed with failure to write to cgroup.procs #1326

Open sboschman opened 7 years ago

sboschman commented 7 years ago

On our Jenkins CI infrastructure we run Maven builds inside a Docker container. Unfortunately once in a while the build container crashes during the execution of the Maven build with a failure writing a pid to the cgroup.proc file.

Feb` 13 23:56:41 myhost dockerd-current[5659]: time="2017-02-13T23:56:41.695619134+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\"process_linux.go:87: adding pid 21890 to cgroups caused \\\\"failed to write 21890 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-f2ea7bd5f37f4d5719fec4a05fdb58401207c98b6abbfa02e497af4bc167ec08.scope/cgroup.procs: invalid argument\\\\"\\"\n\

Feb 14 04:34:52 myhost dockerd-current[5659]: time="2017-02-14T04:34:52.084545467+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\"process_linux.go:87: adding pid 36656 to cgroups caused \\\\"failed to write 36656 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-2b412ea4871f6b1ed33547224bac344ea677f1814604a501a44e42ce84b64854.scope/cgroup.procs: invalid argument\\\\"\\"\n\""

Feb 14 06:20:22 myhost dockerd-current[5659]: time="2017-02-14T06:20:22.751841415+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value \"oci runtime error: exec failed: container_linux.go:247: starting container process caused \\"process_linux.go:87: adding pid 30239 to cgroups caused \\\\"failed to write 30239 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-1418520af414c01054dce8ca3777b616289ee41a0de7ad135af8e2e740472a49.scope/cgroup.procs: invalid argument\\\\"\\"\n\""```

I assume the error is thrown from https://github.com/opencontainers/runc/blob/v1.0.0-rc2/libcontainer/cgroups/utils.go#L422 , which boils down to https://github.com/golang/go/blob/master/src/io/ioutil/ioutil.go#L76 and https://github.com/golang/go/blob/master/src/os/file.go#L139


 Running: 3
 Paused: 0
 Stopped: 7
Images: 54
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 10.18 GB
 Data Space Total: 45.1 GB
 Data Space Available: 34.92 GB
 Metadata Space Used: 4.526 MB
 Metadata Space Total: 4.295 GB
 Metadata Space Available: 4.29 GB                                                                                                                                                                                                                                                                                           
 Thin Pool Minimum Free Space: 4.509 GB                                                                                                                                                                                                                                                                                      
 Udev Sync Supported: true                                                                                                                                                                                                                                                                                                   
 Deferred Removal Enabled: true                                                                                                                                                                                                                                                                                              
 Deferred Deletion Enabled: true                                                                                                                                                                                                                                                                                             
 Deferred Deleted Device Count: 0                                                                                                                                                                                                                                                                                            
 Library Version: 1.02.136 (2016-11-05)                                                                                                                                                                                                                                                                                      
Logging Driver: journald                                                                                                                                                                                                                                                                                                     
Cgroup Driver: systemd                                                                                                                                                                                                                                                                                                       
Plugins:                                                                                                                                                                                                                                                                                                                     
 Volume: local                                                                                                                                                                                                                                                                                                               
 Network: overlay host null bridge                                                                                                                                                                                                                                                                                           
Swarm: inactive                                                                                                                                                                                                                                                                                                              
Runtimes: oci runc                                                                                                                                                                                                                                                                                                           
Default Runtime: oci                                                                                                                                                                                                                                                                                                         
Security Options: seccomp selinux                                                                                                                                                                                                                                                                                            
Kernel Version: 4.9.7-201.fc25.x86_64                                                                                                                                                                                                                                                                                        
Operating System: Fedora 25 (Atomic Host)                                                                                                                                                                                                                                                                                    
OSType: linux                                                                                                                                                                                                                                                                                                                
Architecture: x86_64                                                                                                                                                                                                                                                                                                         
Number of Docker Hooks: 2                                                                                                                                                                                                                                                                                                    
CPUs: 56                                                                                                                                                                                                                                                                                                                     
Total Memory: 125.8 GiB                                                                                                                                                                                                                                                                                                      
Name: myhost                                                                                                                                                                                                                                                                                           
ID: XDZT:BINX:3JZJ:BABH:6WSS:T2D5:Z5XJ:FM3Y:HOG7:XB33:T22Z:F2IS                                                                                                                                                                                                                                                              
Docker Root Dir: /var/lib/docker                                                                                                                                                                                                                                                                                             
Debug Mode (client): false                                                                                                                                                                                                                                                                                                   
Debug Mode (server): false                                                                                                                                                                                                                                                                                                   
Registry: https://index.docker.io/v1/                                                                                                                                                                                                                                                                                        
Insecure Registries:                                                                                                                                                                                                                                                                                                         
 127.0.0.0/8                                                                                                                                                                                                                                                                                                                 
Registries: docker.io (secure)    ```
hqhq commented 7 years ago

Given that all errors happened when you were using docker exec and trying to join cpu, cpuacct group, the only possibility I can think of is somehow the process got PF_NO_SETAFFINITY set (usually not possibly in userspace) or the process was set to be an RT process without rt_runtime allocated in the cgroup.

chinglinwen commented 5 years ago

does this relate to https://github.com/opencontainers/runc/issues/1884

I have same error text

"note": "Liveness probe failed: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 27257 to cgrou
ps caused \\\"failed to write 27257 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/poda16dca42-8cfc-11e9-8753-767ef6f517db/443e19668182ba1351c93af648fad2f8
b839990567d5fd4c612c152800888301/cgroup.procs: invalid argument\\\"\": unknown\r\n",
 "type": "Warning",

readiness check:

        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - redis-cli
            - -h
            - ${POD_IP}
            - -p
            - "19000"
            - ping
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1

kubernetes v1.14.1 os: CentOS Linux release 7.4.1708 (Core) kernel: 4.14.15-1.el7.elrepo.x86_64 docker: 18.06.2-ce (API version: 1.38 (minimum version 1.12)

zeusro commented 5 years ago

@chinglinwen Similar situation like you.

Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:262: starting container process caused "process_linux.go:86: adding pid 16166 to cgroups caused \"failed to write 16166 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e7e876e_9957_11e9_a845_00163e08cd06.slice/docker-941ddc07fc84ba668df4821403a6b051c85aad4cf6c64153aae0e9a0977d943d.scope/cgroup.procs: invalid argument\

 Kernel Version:             3.10.0-693.2.2.el7.x86_64
 OS Image:                   CentOS Linux 7 (Core)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.6.2
 Kubelet Version:            v1.12.6-aliyun.1
 Kube-Proxy Version:         v1.12.6-aliyun.1
nnvema commented 5 years ago

I face the same problem running on kops -1.11.7

Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 27268 to cgroups caused \"failed to write 27268 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod612392f6-a439-11e9-9830-0e60496b67de/958ab969a162d91c9e58cb9e84db295083dfb3e4aa833e7575d3d042bffce720/cgroup.procs: invalid argument\""

  Normal   Killing    20m (x9 over 83d)   kubelet,   Killing container with id docker://kd-inventory:Container failed liveness probe.. Container will be killed and recreated.

ilyesAj commented 3 years ago

any updates on this issue ?

kolyshkin commented 3 years ago

@ilyesAj do you see this, too? If yes, can you peek into the kernel logs (dmesg) and see if there's anything from the OOM killer. I suspect this is a race between runc trying to start exec and the kernel killing the exec'ed process.