moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.58k stars 18.64k forks source link

Docker is out of sync for OOM killed containers #39316

Closed gufranmmu closed 5 years ago

gufranmmu commented 5 years ago

Description Docker is out of sync with OOM killed containers. Cannot remove the container.

Cannot remove the container. Docker inspect shows container running. Steps to reproduce the issue:

  1. OOM kill a container
  2. See from logs dockerd[7615]: time="2019-06-02T19:58:43.930286416Z" level=warning msg="containerd: event not sent to subscriber" event=oom"
  3. Try to remove the container with docker kill

Describe the results you received: orphaned container that cannot be deleted, nor removed. Incorrect container status.

Describe the results you expected: Docker cleanly handles removed containers and shows proper state.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:38:28 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 06:38:28 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 20
 Running: 19
 Paused: 0
 Stopped: 1
Images: 64
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local rexray flocker
 Network: bridge cilium host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.20.6-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 36
Total Memory: 212.5 GiB
Name: avafpprod-slave-44
ID: JTVS:ELFI:MSS3:7X5S:IYRF:EMQM:5OPY:FUWN:MVLW:4AEE:H6LZ:7A7D
Docker Root Dir: /ephemeral/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 docker-registry.marathon.slave.mesos:5000
 nexus.marathon.mesos:5000
 nexus.marathon.mesos:5001
 nexus.marathon.mesos:5002
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): OpenStack instance with mesos and kubernetes installed.

e28446e38f88        /thrift-server:3.0.0                                                                 "/bin/sh -c '/scri..."    22 hours ago        Up 22 hours                                    
journalctl -u docker.service
-- Logs begin at Sun 2019-06-02 19:58:45 UTC, end at Tue 2019-06-04 11:32:39 UTC. --
Jun 02 19:58:45 avafpprod-slave-44 dockerd[7615]: time="2019-06-02T19:58:43.929793663Z" level=warning msg="containerd: event not sent to subscriber" event=oom
Jun 02 19:58:45 avafpprod-slave-44 dockerd[7615]: time="2019-06-02T19:58:43.929801651Z" level=warning msg="containerd: event not sent to subscriber" event=oom
docker top e28446e38f88
Error response from daemon: rpc error: code = 2 desc = containerd: container not found
docker inspect e28446e38f88
[
    {
        "Id": "e28446e38f8897049a38284047eaf4551fe917818035893f09af129111220444",
        "Created": "2019-06-03T12:34:57.17386558Z",
        "Path": "/bin/sh",
        "Args": [
            "-c",
            "/scripts/run.sh &&   /scripts/run_thrift.sh &&   /scripts/run_kong.sh &&   /opt/spark/sbin/restart_thriftserver.sh"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 27914,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2019-06-03T12:34:59.386017836Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
docker rm e28446e38f88
Error response from daemon: You cannot remove a running container e28446e38f8897049a38284047eaf4551fe917818035893f09af129111220444. Stop the container before attempting removal or use -f
docker kill e28446e38f88
e28446e38f88
docker rm e28446e38f88
Error response from daemon: You cannot remove a running container e28446e38f8897049a38284047eaf4551fe917818035893f09af129111220444. Stop the container before attempting removal or use -f
docker rm -f e28446e38f88
Error response from daemon: Unable to remove filesystem for e28446e38f8897049a38284047eaf4551fe917818035893f09af129111220444: remove /ephemeral/docker/containers/e28446e38f8897049a38284047eaf4551fe917818035893f09af129111220444/shm: device or resource busy
/sys/fs/cgroup/devices/docker/
29a71a3cd2fc083302f214236e6748246ecac3b84d0b4933138598a397f4561d/ 842092d3510726766c96b880374d252f09fc4e1afddfa95e1ad4b5fbc70f814c/
2f5fddf98db20c26fb16b1c04f83817b05d3fb9eb68667711d848384588ad650/ db711a4b2b5acf4f5525615a5727a21e10eb7bcb5492a23456b636985ad96b77/
3aaf059b8cf05e208777d243813fef5c71caa1d263f200476274a30a20dfeb51/ df5eca1c716ffc53006054b8ceabb673a71353769045715d7770dd8ae7b64579/
64af1a7b09ef61792ed7b3028cbb3d40682efc3be73d2e6e7041791054548967/ f41627d7343ef031fe84f665c5f6e862c0635e57e9d71d1d80319ae15c39671c/
70cdda0d7f77d15d7ed351776bf19415dc5ef24de5eb7a23d8dd022b9fc24537/
thaJeztah commented 5 years ago

Docker 1.13 reached EOL over two years ago, and is no longer maintained. It's highly discouraged to run such an old version, as it has known unpatched vulnerabilities (including container escapes allowing a container to get root access on the host).

Closing this issue because of the above, but feel free to open a ticket if you're able to reproduce on a currently maintained version

gufranmmu commented 5 years ago

Thanks! @thaJeztah A good reminder to update docker version :)

xuegege5290 commented 2 years ago

Thanks! @thaJeztah A good reminder to update docker version :) hey bro,i meet the problem too.Have you found the reason of "event not sent to subscriber"?