moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.14k stars 18.58k forks source link

Docker container stopped responding to external commands after killed by OOM killer #42392

Open ghost opened 3 years ago

ghost commented 3 years ago

Description Container does not respond to external commands when killed by OOM killer.

Steps to reproduce the issue:

  1. Start a container with memory restriction
  2. Wait for it to exhaust the memory and get killed by OS OOM Killer

Describe the results you received: Container did not restart automatically after killed by OOM killer. Also, the container stopped responding to external commands.

Describe the results you expected: Container should restart automatically after killed by OOM killer.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

docker version

Client:
 Version:           19.03.6-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        369ce74
 Built:             Fri May 29 04:01:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       369ce74
  Built:            Fri May 29 04:01:57 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.2
  GitCommit:        ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

docker info

Client:
 Debug Mode: false

Server:
 Containers: 41
  Running: 41
  Paused: 0
  Stopped: 0
 Images: 26
 Server Version: 19.03.6-ce
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.14.181-140.257.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 249.2GiB
 Name: ip-10-70-10-146.f58
 ID: TL4H:2BMQ:7UNO:OSER:BPX5:UB3Q:TIJV:FV5D:QQUZ:FSAF:6RNR:SKYG
 Docker Root Dir: /docker_volumes/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.):

OS

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
Amazon Linux release 2 (Karoo)

Docker container was killed by OOM Killer

Mar 22 19:45:44 ip-10-70-10-146 kernel: Task in /docker/f4ab55f90fc8769391709dad3d55fef2359fc5e768f4d0f9ad3e9621ae56a028 killed as a result of limit of /docker/f4ab55f90fc8769391709dad3d55fef2359fc5e768f4d0f9ad3e9621ae56a028
Mar 22 19:45:44 ip-10-70-10-146 kernel: memory: usage 12582912kB, limit 12582912kB, failcnt 42572901
Mar 22 19:45:44 ip-10-70-10-146 kernel: memory+swap: usage 12582912kB, limit 25165824kB, failcnt 0
Mar 22 19:45:44 ip-10-70-10-146 kernel: kmem: usage 95144kB, limit 9007199254740988kB, failcnt 0
Mar 22 19:45:44 ip-10-70-10-146 kernel: Memory cgroup stats for /docker/f4ab55f90fc8769391709dad3d55fef2359fc5e768f4d0f9ad3e9621ae56a028: cache:1744388KB rss:10743380KB rss_huge:0KB shmem:0KB mapped_file:8KB dirty:1744380KB writeback:0KB
Mar 22 19:45:44 ip-10-70-10-146 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Mar 22 19:45:44 ip-10-70-10-146 kernel: [16199]     0 16199     5014      443      14       3        0             0 entrypoint.sh
Mar 22 19:45:44 ip-10-70-10-146 kernel: [16355]     0 16355  3265964  2690605    5844      14        0             0 mongod
Mar 22 19:45:44 ip-10-70-10-146 kernel: Memory cgroup out of memory: Kill process 16355 (mongod) score 857 or sacrifice child
Mar 22 19:45:44 ip-10-70-10-146 kernel: Killed process 16355 (mongod) total-vm:13063856kB, anon-rss:10739916kB, file-rss:22504kB, shmem-rss:0kB
Mar 22 19:45:44 ip-10-70-10-146 kernel: oom_reaper: reaped process 16355 (mongod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mar 22 19:45:45 ip-10-70-10-146 containerd: time="2021-03-22T19:45:45.124026371-05:00" level=info msg="shim reaped" id=f4ab55f90fc8769391709dad3d55fef2359fc5e768f4d0f9ad3e9621ae56a028
Mar 22 19:45:45 ip-10-70-10-146 dockerd: time="2021-03-22T19:45:45.134203014-05:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

after that, the container becomes unresponsive to commands, Tried

  1. restart container
  2. stop container
  3. kill container
  4. docker logs
  5. restart daemon

all commands are unresponsive, and after restarting the host os, the containers were started.

thaJeztah commented 3 years ago

From the output of docker version and docker info, I think this is a build of docker that's maintained by Amazon (we don't build packages for Amazon Linux currently), and both Docker and Containerd are quite outdated (note that containerd 1.3 reached EOL). Are you able to reproduce this on a current version of docker and containerd?

ykuksenko commented 3 years ago

seems like #42437 has the same issue with newer docker/containerd versions

whbackus commented 1 year ago

Hi, I have same issue, this is my environment:

Client: Docker Engine - Community Version: 20.10.6 API version: 1.41 Go version: go1.13.15 Git commit: 370c289 Built: Fri Apr 9 22:47:17 2021 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.6 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8728dd2 Built: Fri Apr 9 22:45:28 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.4 GitCommit: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16 runc: Version: 1.1.1 GitCommit: v1.1.1-0-g52de29d docker-init: Version: 0.19.0 GitCommit: de40ad0