moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.67k stars 18.65k forks source link

Docker commands (rm/kill/inspect/...) hangs on a said running but already exited container #42894

Open Sh4d1 opened 3 years ago

Sh4d1 commented 3 years ago

Description

Context:

A running container (launched with docker-compose) and a restart: no policy, with a process that exit with a status code of 0. Here is the docker-compose file (docker-compose version 1.25.4, build 8d51620a) (just anonymized some info with ***):

version: '3'
services:
  ***:
    image: ***
    container_name: ***
    restart: never
    network_mode: bridge
    hostname: ***
    command: ['***', '***']
    volumes:
      - ./data:/data
      - /etc/ssl/certs:/etc/ssl/certs:ro

    logging:
      driver: fluentd
      options:
        fluentd-address: localhost
        fluentd-async-connect: 'true'
        fluentd-buffer-limit: 2M

When seeing this, restart: never is not a valid policy yet docker-compose does not mind, so I guess it's the no default restart policy that is in use (fixed with later docker-compose release).

Issue:

When trying to stop/kill/inspect/rm this container, all the docker <action> <container_id> hangs.

I've found https://github.com/moby/moby/issues/30927 which is kind of old and https://github.com/moby/moby/issues/40817 (see below but I don't have any hung runc processes)

The stuck container ID here is 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 and the process linked to this container, is non existant.

What I've seen:

{
    "ID": "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
    "Labels": {
        "com.docker/engine.bundle.path": "/var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602"
    },
    "Image": "",
    "Runtime": {
        "Name": "io.containerd.runc.v2",
        "Options": {
            "type_url": "containerd.runc.v1.Options",
            "value": "MgRydW5jOhwvdmFyL3J1bi9kb2NrZXIvcnVudGltZS1ydW5j"
        }
    },
    "SnapshotKey": "",
    "Snapshotter": "",
    "CreatedAt": "2021-09-28T12:49:45.578665569Z",
    "UpdatedAt": "2021-09-28T12:49:45.578665569Z",
    "Extensions": null,
    "Spec": {
        "ociVersion": "1.0.2-dev",
        "process": {
            "user": {
                "uid": 101,
                "gid": 101,
                "additionalGids": [
                    101
                ]
            },
            "args": [
                "***",
                "***"
            ],
            "env": [
                "PATH=/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "HOSTNAME=***",
                "LANG=C.UTF-8"
            ],
            "cwd": "/",
            "capabilities": {
                "bounding": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ],
                "inheritable": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ]
            },
            "apparmorProfile": "docker-default",
            "oomScoreAdj": 0
        },
        "root": {
            "path": "/var/lib/docker/overlay2/0ddc2295cbbe3affd23cfd474488f88d715ae4413f8f655830fbaf0274441002/merged"
        },
        "hostname": "***",
        "mounts": [
            {
                "destination": "/proc",
                "type": "proc",
                "source": "proc",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/dev",
                "type": "tmpfs",
                "source": "tmpfs",
                "options": [
                    "nosuid",
                    "strictatime",
                    "mode=755",
                    "size=65536k"
                ]
            },
            {
                "destination": "/dev/pts",
                "type": "devpts",
                "source": "devpts",
                "options": [
                    "nosuid",
                    "noexec",
                    "newinstance",
                    "ptmxmode=0666",
                    "mode=0620",
                    "gid=5"
                ]
            },
            {
                "destination": "/sys",
                "type": "sysfs",
                "source": "sysfs",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev",
                    "ro"
                ]
            },
            {
                "destination": "/sys/fs/cgroup",
                "type": "cgroup",
                "source": "cgroup",
                "options": [
                    "ro",
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/dev/mqueue",
                "type": "mqueue",
                "source": "mqueue",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/data",
                "type": "bind",
                "source": "***",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/resolv.conf",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/resolv.conf",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/hostname",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hostname",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/hosts",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hosts",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/dev/shm",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/ssl/certs",
                "type": "bind",
                "source": "/etc/ssl/certs",
                "options": [
                    "rbind",
                    "ro",
                    "rprivate"
                ]
            }
        ],
        "hooks": {
            "prestart": [
                {
                    "path": "/proc/5754/exe",
                    "args": [
                        "libnetwork-setkey",
                        "-exec-root=/var/run/docker",
                        "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
                        "8d348b5ac848"
                    ]
                }
            ]
        },
        "linux": {
            "sysctl": {
                "net.ipv4.ip_unprivileged_port_start": "0"
            },
            "resources": {
                "devices": [
                    {
                        "allow": false,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 5,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 3,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 9,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 8,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 0,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 1,
                        "access": "rwm"
                    },
                    {
                        "allow": false,
                        "type": "c",
                        "major": 10,
                        "minor": 229,
                        "access": "rwm"
                    }
                ],
                "memory": {
                    "disableOOMKiller": false
                },
                "cpu": {
                    "shares": 0
                },
                "blockIO": {
                    "weight": 0
                }
            },
            "cgroupsPath": "/docker/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
            "namespaces": [
                {
                    "type": "mount"
                },
                {
                    "type": "network"
                },
                {
                    "type": "uts"
                },
                {
                    "type": "pid"
                },
                {
                    "type": "ipc"
                }
            ],
            "seccomp": {
                "defaultAction": "SCMP_ACT_ERRNO",
                "architectures": [
                    "SCMP_ARCH_X86_64",
                    "SCMP_ARCH_X86",
                    "SCMP_ARCH_X32"
                ],
                "syscalls": [
                    {
                        "names": [
                            "accept",
                            "accept4",
                            "access",
                            "adjtimex",
                            "alarm",
                            "bind",
                            "brk",
                            "capget",
                            "capset",
                            "chdir",
                            "chmod",
                            "chown",
                            "chown32",
                            "clock_adjtime",
                            "clock_adjtime64",
                            "clock_getres",
                            "clock_getres_time64",
                            "clock_gettime",
                            "clock_gettime64",
                            "clock_nanosleep",
                            "clock_nanosleep_time64",
                            "close",
                            "close_range",
                            "connect",
                            "copy_file_range",
                            "creat",
                            "dup",
                            "dup2",
                            "dup3",
                            "epoll_create",
                            "epoll_create1",
                            "epoll_ctl",
                            "epoll_ctl_old",
                            "epoll_pwait",
                            "epoll_pwait2",
                            "epoll_wait",
                            "epoll_wait_old",
                            "eventfd",
                            "eventfd2",
                            "execve",
                            "execveat",
                            "exit",
                            "exit_group",
                            "faccessat",
                            "faccessat2",
                            "fadvise64",
                            "fadvise64_64",
                            "fallocate",
                            "fanotify_mark",
                            "fchdir",
                            "fchmod",
                            "fchmodat",
                            "fchown",
                            "fchown32",
                            "fchownat",
                            "fcntl",
                            "fcntl64",
                            "fdatasync",
                            "fgetxattr",
                            "flistxattr",
                            "flock",
                            "fork",
                            "fremovexattr",
                            "fsetxattr",
                            "fstat",
                            "fstat64",
                            "fstatat64",
                            "fstatfs",
                            "fstatfs64",
                            "fsync",
                            "ftruncate",
                            "ftruncate64",
                            "futex",
                            "futex_time64",
                            "futimesat",
                            "getcpu",
                            "getcwd",
                            "getdents",
                            "getdents64",
                            "getegid",
                            "getegid32",
                            "geteuid",
                            "geteuid32",
                            "getgid",
                            "getgid32",
                            "getgroups",
                            "getgroups32",
                            "getitimer",
                            "getpeername",
                            "getpgid",
                            "getpgrp",
                            "getpid",
                            "getppid",
                            "getpriority",
                            "getrandom",
                            "getresgid",
                            "getresgid32",
                            "getresuid",
                            "getresuid32",
                            "getrlimit",
                            "get_robust_list",
                            "getrusage",
                            "getsid",
                            "getsockname",
                            "getsockopt",
                            "get_thread_area",
                            "gettid",
                            "gettimeofday",
                            "getuid",
                            "getuid32",
                            "getxattr",
                            "inotify_add_watch",
                            "inotify_init",
                            "inotify_init1",
                            "inotify_rm_watch",
                            "io_cancel",
                            "ioctl",
                            "io_destroy",
                            "io_getevents",
                            "io_pgetevents",
                            "io_pgetevents_time64",
                            "ioprio_get",
                            "ioprio_set",
                            "io_setup",
                            "io_submit",
                            "io_uring_enter",
                            "io_uring_register",
                            "io_uring_setup",
                            "ipc",
                            "kill",
                            "lchown",
                            "lchown32",
                            "lgetxattr",
                            "link",
                            "linkat",
                            "listen",
                            "listxattr",
                            "llistxattr",
                            "_llseek",
                            "lremovexattr",
                            "lseek",
                            "lsetxattr",
                            "lstat",
                            "lstat64",
                            "madvise",
                            "membarrier",
                            "memfd_create",
                            "mincore",
                            "mkdir",
                            "mkdirat",
                            "mknod",
                            "mknodat",
                            "mlock",
                            "mlock2",
                            "mlockall",
                            "mmap",
                            "mmap2",
                            "mprotect",
                            "mq_getsetattr",
                            "mq_notify",
                            "mq_open",
                            "mq_timedreceive",
                            "mq_timedreceive_time64",
                            "mq_timedsend",
                            "mq_timedsend_time64",
                            "mq_unlink",
                            "mremap",
                            "msgctl",
                            "msgget",
                            "msgrcv",
                            "msgsnd",
                            "msync",
                            "munlock",
                            "munlockall",
                            "munmap",
                            "nanosleep",
                            "newfstatat",
                            "_newselect",
                            "open",
                            "openat",
                            "openat2",
                            "pause",
                            "pidfd_open",
                            "pidfd_send_signal",
                            "pipe",
                            "pipe2",
                            "poll",
                            "ppoll",
                            "ppoll_time64",
                            "prctl",
                            "pread64",
                            "preadv",
                            "preadv2",
                            "prlimit64",
                            "pselect6",
                            "pselect6_time64",
                            "pwrite64",
                            "pwritev",
                            "pwritev2",
                            "read",
                            "readahead",
                            "readlink",
                            "readlinkat",
                            "readv",
                            "recv",
                            "recvfrom",
                            "recvmmsg",
                            "recvmmsg_time64",
                            "recvmsg",
                            "remap_file_pages",
                            "removexattr",
                            "rename",
                            "renameat",
                            "renameat2",
                            "restart_syscall",
                            "rmdir",
                            "rseq",
                            "rt_sigaction",
                            "rt_sigpending",
                            "rt_sigprocmask",
                            "rt_sigqueueinfo",
                            "rt_sigreturn",
                            "rt_sigsuspend",
                            "rt_sigtimedwait",
                            "rt_sigtimedwait_time64",
                            "rt_tgsigqueueinfo",
                            "sched_getaffinity",
                            "sched_getattr",
                            "sched_getparam",
                            "sched_get_priority_max",
                            "sched_get_priority_min",
                            "sched_getscheduler",
                            "sched_rr_get_interval",
                            "sched_rr_get_interval_time64",
                            "sched_setaffinity",
                            "sched_setattr",
                            "sched_setparam",
                            "sched_setscheduler",
                            "sched_yield",
                            "seccomp",
                            "select",
                            "semctl",
                            "semget",
                            "semop",
                            "semtimedop",
                            "semtimedop_time64",
                            "send",
                            "sendfile",
                            "sendfile64",
                            "sendmmsg",
                            "sendmsg",
                            "sendto",
                            "setfsgid",
                            "setfsgid32",
                            "setfsuid",
                            "setfsuid32",
                            "setgid",
                            "setgid32",
                            "setgroups",
                            "setgroups32",
                            "setitimer",
                            "setpgid",
                            "setpriority",
                            "setregid",
                            "setregid32",
                            "setresgid",
                            "setresgid32",
                            "setresuid",
                            "setresuid32",
                            "setreuid",
                            "setreuid32",
                            "setrlimit",
                            "set_robust_list",
                            "setsid",
                            "setsockopt",
                            "set_thread_area",
                            "set_tid_address",
                            "setuid",
                            "setuid32",
                            "setxattr",
                            "shmat",
                            "shmctl",
                            "shmdt",
                            "shmget",
                            "shutdown",
                            "sigaltstack",
                            "signalfd",
                            "signalfd4",
                            "sigprocmask",
                            "sigreturn",
                            "socket",
                            "socketcall",
                            "socketpair",
                            "splice",
                            "stat",
                            "stat64",
                            "statfs",
                            "statfs64",
                            "statx",
                            "symlink",
                            "symlinkat",
                            "sync",
                            "sync_file_range",
                            "syncfs",
                            "sysinfo",
                            "tee",
                            "tgkill",
                            "time",
                            "timer_create",
                            "timer_delete",
                            "timer_getoverrun",
                            "timer_gettime",
                            "timer_gettime64",
                            "timer_settime",
                            "timer_settime64",
                            "timerfd_create",
                            "timerfd_gettime",
                            "timerfd_gettime64",
                            "timerfd_settime",
                            "timerfd_settime64",
                            "times",
                            "tkill",
                            "truncate",
                            "truncate64",
                            "ugetrlimit",
                            "umask",
                            "uname",
                            "unlink",
                            "unlinkat",
                            "utime",
                            "utimensat",
                            "utimensat_time64",
                            "utimes",
                            "vfork",
                            "vmsplice",
                            "wait4",
                            "waitid",
                            "waitpid",
                            "write",
                            "writev"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "ptrace"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 0,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 8,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 131072,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 131080,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 4294967295,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "arch_prctl"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "modify_ldt"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "clone"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 2114060288,
                                "op": "SCMP_CMP_MASKED_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "chroot"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    }
                ]
            },
            "maskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "readonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        }
    }
}

3 directories, 7 files

- `cat /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/config.v2.json`
<details>
  <summary>Click to see</summary>

```json
{
  "StreamConfig": {},
  "State": {
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "RemovalInProgress": false,
    "Dead": false,
    "Pid": 6108,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2021-09-28T12:49:45.945753185Z",
    "FinishedAt": "2021-09-28T12:49:44.593011751Z",
    "Health": null
  },
  "ID": "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
  "Created": "2021-09-07T12:27:41.549873025Z",
  "Managed": false,
  "Path": "***",
  "Args": [
    "***"
  ],
  "Config": {
    "Hostname": "***",
    "Domainname": "",
    "User": "***",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "LANG=C.UTF-8"
    ],
    "Cmd": [
      "***",
      "***"
    ],
    "Image": "***",
    "Volumes": {
      "/data": {},
      "/etc/ssl/certs": {}
    },
    "WorkingDir": "",
    "Entrypoint": null,
    "OnBuild": null,
    "Labels": {
      "com.docker.compose.config-hash": "318cec37934e0aa3669251c5d0c65762842db6aa7221a07df941f247529fb92d",
      "com.docker.compose.container-number": "1",
      "com.docker.compose.oneoff": "False",
      "com.docker.compose.project": "***",
      "com.docker.compose.project.config_files": "***",
      "com.docker.compose.project.working_dir": "***",
      "com.docker.compose.service": "***",
      "com.docker.compose.version": "1.25.4"
    }
  },
  "Image": "sha256:88efb1b4c07c8e691f504779ba534e7d079171a67a30ce4d90b6e894833e8da4",
  "NetworkSettings": {
    "Bridge": "",
    "SandboxID": "06dbfe1d245992abf3f075ecd893b1b6a44957519cbee8b77e3acaca579dc625",
    "HairpinMode": false,
    "LinkLocalIPv6Address": "",
    "LinkLocalIPv6PrefixLen": 0,
    "Networks": {
      "bridge": {
        "IPAMConfig": null,
        "Links": null,
        "Aliases": null,
        "NetworkID": "8b71e7a01854df19d6bf23ecbf76c1379317f39d34965d0c1992df62b40ed2e7",
        "EndpointID": "4395f7f3b45e21fd5b7516a771265d6ce81d9e93b01fd1c0d30767f642e98c6a",
        "Gateway": "100.64.0.1",
        "IPAddress": "100.64.0.4",
        "IPPrefixLen": 24,
        "IPv6Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "MacAddress": "02:42:64:40:00:04",
        "DriverOpts": null,
        "IPAMOperational": false
      }
    },
    "Service": null,
    "Ports": {},
    "SandboxKey": "/var/run/docker/netns/06dbfe1d2459",
    "SecondaryIPAddresses": null,
    "SecondaryIPv6Addresses": null,
    "IsAnonymousEndpoint": false,
    "HasSwarmEndpoint": false
  },
  "LogPath": "",
  "Name": "***",
  "Driver": "overlay2",
  "OS": "linux",
  "MountLabel": "",
  "ProcessLabel": "",
  "RestartCount": 0,
  "HasBeenStartedBefore": true,
  "HasBeenManuallyStopped": false,
  "MountPoints": {
    "/data": {
      "Source": "***",
      "Destination": "/data",
      "RW": true,
      "Name": "",
      "Driver": "",
      "Type": "bind",
      "Relabel": "rw",
      "Propagation": "rprivate",
      "Spec": {
        "Type": "bind",
        "Source": "***",
        "Target": "/data"
      },
      "SkipMountpointCreation": false
    },
    "/etc/ssl/certs": {
      "Source": "/etc/ssl/certs",
      "Destination": "/etc/ssl/certs",
      "RW": false,
      "Name": "",
      "Driver": "",
      "Type": "bind",
      "Relabel": "ro",
      "Propagation": "rprivate",
      "Spec": {
        "Type": "bind",
        "Source": "/etc/ssl/certs",
        "Target": "/etc/ssl/certs",
        "ReadOnly": true
      },
      "SkipMountpointCreation": false
    }
  },
  "SecretReferences": null,
  "ConfigReferences": null,
  "AppArmorProfile": "docker-default",
  "HostnamePath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hostname",
  "HostsPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hosts",
  "ShmPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm",
  "ResolvConfPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/resolv.conf",
  "SeccompProfile": "",
  "NoNewPrivileges": false,
  "LocalLogCacheMeta": {
    "HaveNotifyEnabled": true
  }
}

Then I tried:

Then I fixed it with a known fix (had to fix the issue):

- `rm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602`

rm: cannot remove '/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm': Device or resource busy

- `umount /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm` ok
- `rm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602` ok
- `systemctl start docker`

Sep 28 15:57:40 dockerd[17375]: time="2021-09-28T15:57:40.517315214Z" level=info msg="Loading containers: start." Sep 28 15:57:40 dockerd[17375]: time="2021-09-28T15:57:40.517551932Z" level=error msg="failed to load container" container=2d36abfab7fa136d8d36359c2cae30314d83ddadc37985d25531a5f0a1529779 error="open /var/lib/docker/containers/2d36abfab7fa136d8d36359c2cae30314d83ddadc37985d25531a5f0a1529779/config.v2.json: no such file or directory" Sep 28 15:57:40 dockerd[17375]: time="2021-09-28T15:57:40.518271443Z" level=error msg="failed to load container" container=1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 error="open /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/config.v2.json: no such file or directory" Sep 28 15:57:40 dockerd[17375]: time="2021-09-28T15:57:40.878044379Z" level=info msg="Removing stale sandbox 06dbfe1d245992abf3f075ecd893b1b6a44957519cbee8b77e3acaca579dc625 (1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)" Sep 28 15:57:40 dockerd[17375]: time="2021-09-28T15:57:40.896801126Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 8b71e7a01854df19d6bf23ecbf76c1379317f39d34965d0c1992df62b40ed2e7 4395f7f3b45e21fd5b7516a771265d6ce81d9e93b01fd1c0d30767f642e98c6a], retrying...." Sep 28 15:57:41 dockerd[17375]: time="2021-09-28T15:57:41.845363492Z" level=info msg="Loading containers: done." Sep 28 15:57:41 dockerd[17375]: time="2021-09-28T15:57:41.885190387Z" level=info msg="Docker daemon" commit=75249d8 graphdriver(s)=overlay2 version=20.10.8 Sep 28 15:57:41 dockerd[17375]: time="2021-09-28T15:57:41.885304842Z" level=info msg="Daemon has completed initialization" Sep 28 15:57:41 dockerd[17375]: time="2021-09-28T15:57:41.931954248Z" level=info msg="API listen on /var/run/docker.sock" Sep 28 15:57:41 systemd[1]: Started Docker Application Container Engine.


**Steps to reproduce the issue:**

Seems pretty random and kind of rare 😅 

**Output of `docker version`:**

Client: Docker Engine - Community Version: 20.10.8 API version: 1.41 Go version: go1.16.6 Git commit: 3967b7d Built: Fri Jul 30 19:54:27 2021 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.8 API version: 1.41 (minimum version 1.12) Go version: go1.16.6 Git commit: 75249d8 Built: Fri Jul 30 19:52:33 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.9 GitCommit: e25210fe30a0a703442421b0f60afac609f950a3 runc: Version: 1.0.1 GitCommit: v1.0.1-0-g4144b63 docker-init: Version: 0.19.0 GitCommit: de40ad0


**Output of `docker info`:**

Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Build with BuildKit (Docker Inc., v0.6.1-docker) scan: Docker Scan (Docker Inc., v0.8.0)

Server: Containers: 8 Running: 5 Paused: 0 Stopped: 3 Images: 9 Server Version: 20.10.8 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: e25210fe30a0a703442421b0f60afac609f950a3 runc version: v1.0.1-0-g4144b63 init version: de40ad0 Security Options: apparmor seccomp Profile: default Kernel Version: 5.4.0-73-generic Operating System: Ubuntu 20.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 22.81GiB ID: XAWD:7ZZL:Q2TZ:NTTV:ZILV:S335:B3PR:BJRD:76XP:7KVY:CHPS:OGPS Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: :3128 HTTPS Proxy: :3128 Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Default Address Pools: Base: 100.64.0.0/15, Size: 24

WARNING: No swap limit support



**Additional environment details (AWS, VirtualBox, physical, etc.):**
VMs with libvirt on a physical hypervisor
zq-david-wang commented 3 years ago

I had similar issue, some running container shows up in docker ps has already gone(no pid found). It seems to me that the internal state maintained by dockerd is inconsistent with the real state. And this mostly happened when the system is under memory/IO pressure. And restart docker restore the state. Maybe you should check oomkill/hang errors in kernel log

akerouanton commented 3 years ago

Hello @Sh4d1, when your CLI commands hang, could you generate a stack trace of dockerd ? It would help maintainers/contributors figure out where and why the daemon is stuck. You can find how to create a stack trace here and how to retrieve it here.

Sh4d1 commented 3 years ago

👋 ah good to know! I'll attach it here if it happens again, thanks!

akerouanton commented 3 years ago

I looked at bit more at your description: given the containerd task is gone but the containerd container and the netns are still there, I believe Docker is stuck somewhere here (daemon.Cleanup() calls containerd to delete its container and removes the netns): https://github.com/moby/moby/blob/4283e93e6431c5ff6d59aed2104f0942ae40c838/daemon/monitor.go#L27-L63

I see you're using fluentd in async mode, do you know if the fluentd server was still running when you tried to stop/kill/rm the container? There's a bug that prevents fluentd logger to stop because it's blocked in an exponential backoff retry loop when there're logs to send but the fluentd server is down. This bug manifests the same symptoms (eg. hanging docker commands, etc...).

Sh4d1 commented 3 years ago

Indeed, I suspected flutend at start, but found no clue. IIRC there were some issues with fluentd! I'm going to wait for it to happen again and get the stack trace then! Thanks!

sparrc commented 2 years ago

Hello from AWS ECS, we believe we have also seen this issue, and as the original opener mentioned, it seems rare and hard to reproduce.

We have also noted the relationship to the fluentd log driver, and we have some reason to believe that recent fixes in the fluent-logger-golang library may have fixed it. These fixes were pulled into moby master and backported to docker 20.10.13 here: https://github.com/moby/moby/pull/43147

Has anyone seen this issue using docker 20.10.13+ ?

vnovy commented 1 year ago

I see this or similar issue with following docker logging config "log-driver": "fluentd", "log-opts": { "mode": "non-blocking", "fluentd-async": "false", "fluentd-address": "tcp://x.x.x.x:24224", "tag": "docker.{{.ID}}", "fluentd-sub-second-precision": "true" docker ver. 23.0.1 It seems to be 100% reproducible 1/ start stack with fluentd running 2/ stop fluentd 3/ wait a moment 4/ rm stack 5/ containers which tried to log something after fluentd stop are ghosts

"fluentd-async": "true", problem does not appear