vmware-archive / vsphere-storage-for-docker

vSphere Storage for Docker
https://vmware.github.io/vsphere-storage-for-docker
Apache License 2.0
251 stars 95 forks source link

Container stuck in starting after mounting vsphere volume the second time #2083

Open danielbjornadal opened 6 years ago

danielbjornadal commented 6 years ago

Intro

After creating a vsphere volume, and mounting it to a container for the second time, the container is stuck in starting state.

Versions

ESXi VIB Driver: 0.21.2

# cat /proc/version
Linux version 4.9.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23)
# docker version
Client:
 Version:       17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:17:21 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:        Tue Feb 27 22:15:56 2018
  OS/Arch:      linux/amd64
  Experimental: false
# docker plugin ls
ID                  NAME                DESCRIPTION                           ENABLED
feafba05062c        vsphere:latest      VMWare vSphere Docker Volume plugin   true

Container stuck in starting state

# docker run -d -t --name voltest --volume-driver vsphere -v MyVolume2@t-ESXi.Data01:/mnt busybox

# cat /var/log/vsphere-storage-for-docker.log 
...
2018-04-18 08:56:50.600179505 +0000 UTC [INFO] Mounting volume name="MyVolume2@t-ESXi.Data01" 
2018-04-18 08:56:51.068885679 +0000 UTC [INFO] Device file found. device="/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
# docker exec -it 15ba1ef53525 bash
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown
# docker volume inspect MyVolume2@t-ESXi.Data01
[
    {
        "CreatedAt": "0001-01-01T00:00:00Z",
        "Driver": "vsphere:latest",
        "Labels": null,
        "Mountpoint": "/mnt/vmdk/MyVolume2@t-ESXi.Data01/",
        "Name": "MyVolume2@t-ESXi.Data01",
        "Options": {},
        "Scope": "global",
        "Status": {
            "access": "read-write",
            "attach-as": "independent_persistent",
            "attached to VM": "t-cloudapp03i",
            "attachedVMDevice": {
                "ControllerPciBusNumber": "15.0",
                "ControllerPciSlotNumber": "160",
                "Unit": "1"
            },
            "capacity": {
                "allocated": "15MB",
                "size": "100MB"
            },
            "clone-from": "None",
            "created": "Wed Apr 18 07:40:00 2018",
            "created by VM": "t-cloudapp03i",
            "datastore": "t-ESXi.Data01",
            "diskformat": "thin",
            "fstype": "ext4",
            "status": "attached"
        }
    }
]
# docker inspect 15ba1ef53525
[
    {
        "Id": "15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986",
        "Created": "2018-04-18T08:05:27.750664686Z",
        "Path": "sh",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 3948,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-04-18T08:05:28.680182951Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:8ac48589692a53a9b8c2d1ceaa6b402665aa7fe667ba51ccc03002300856d8c7",
        "ResolvConfPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/hostname",
        "HostsPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/hosts",
        "LogPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986-json.log",
        "Name": "/voltest",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "MyVolume2@t-ESXi.Data01:/mnt"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "vsphere",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d-init/diff:/var/lib/docker/overlay2/6f2bdd4bd14d78d8f024e5aa7d7ec1023765e0bb9f2f2cdcde34ce06272340f6/diff",
                "MergedDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/merged",
                "UpperDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/diff",
                "WorkDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "MyVolume2@t-ESXi.Data01",
                "Source": "/var/lib/docker/plugins/feafba05062cd091a19e4200d62af9394df7dcdee351384e482ab0c79864beab/rootfs/mnt/vmdk/MyVolume2@t-ESXi.Data01",
                "Destination": "/mnt",
                "Driver": "vsphere:latest",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "15ba1ef53525",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": true,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "sh"
            ],
            "ArgsEscaped": true,
            "Image": "busybox",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f4e074987f4e3a6dd5fed6eafb7a92e8a01f9a7a1c041a59e180e787565ec5a3",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/f4e074987f4e",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "28c3388614982fe7fd65e164edd9390ed071fa776b6bfbc525dd74ad9f4a0cc6",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.3",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:03",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "d57371cd50ffc8bd82dc73440f0d5c6f9ed998730ce18c840e4f10750332bece",
                    "EndpointID": "28c3388614982fe7fd65e164edd9390ed071fa776b6bfbc525dd74ad9f4a0cc6",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:03",
                    "DriverOpts": null
                }
            }
        }
    }
]

Container stopped

# docker stop 15ba1ef53525
15ba1ef53525
# cat /var/log/vsphere-storage-for-docker.log
...
2018-04-18 08:58:42.219891086 +0000 UTC [INFO] Unmounting Volume name="MyVolume2@t-ESXi.Data01"
# docker volume inspect MyVolume2@t-ESXi.Data01
[
    {
        "CreatedAt": "0001-01-01T00:00:00Z",
        "Driver": "vsphere:latest",
        "Labels": null,
        "Mountpoint": "/mnt/vmdk/MyVolume2@t-ESXi.Data01/",
        "Name": "MyVolume2@t-ESXi.Data01",
        "Options": {},
        "Scope": "global",
        "Status": {
            "access": "read-write",
            "attach-as": "independent_persistent",
            "capacity": {
                "allocated": "15MB",
                "size": "100MB"
            },
            "clone-from": "None",
            "created": "Wed Apr 18 07:40:00 2018",
            "created by VM": "t-cloudapp03i",
            "datastore": "t-ESXi.Data01",
            "diskformat": "thin",
            "fstype": "ext4",
            "status": "detached"
        }
    }
]
# docker inspect 15ba1ef53525
[
    {
        "Id": "15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986",
        "Created": "2018-04-18T08:05:27.750664686Z",
        "Path": "sh",
        "Args": [],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2018-04-18T08:05:28.680182951Z",
            "FinishedAt": "2018-04-18T08:52:56.926181744Z"
        },
        "Image": "sha256:8ac48589692a53a9b8c2d1ceaa6b402665aa7fe667ba51ccc03002300856d8c7",
        "ResolvConfPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/hostname",
        "HostsPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/hosts",
        "LogPath": "/var/lib/docker/containers/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986/15ba1ef53525086e0eb8e8b72faa2bd9f14100bddd3cc9da4db8be901e59b986-json.log",
        "Name": "/voltest",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "MyVolume2@t-ESXi.Data01:/mnt"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "vsphere",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d-init/diff:/var/lib/docker/overlay2/6f2bdd4bd14d78d8f024e5aa7d7ec1023765e0bb9f2f2cdcde34ce06272340f6/diff",
                "MergedDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/merged",
                "UpperDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/diff",
                "WorkDir": "/var/lib/docker/overlay2/7e5af563b20f1e9d9a7b672349864ed6351b3449dbe4b9ef9b436f7959baea4d/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "MyVolume2@t-ESXi.Data01",
                "Source": "/var/lib/docker/plugins/feafba05062cd091a19e4200d62af9394df7dcdee351384e482ab0c79864beab/rootfs/mnt/vmdk/MyVolume2@t-ESXi.Data01",
                "Destination": "/mnt",
                "Driver": "vsphere:latest",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "15ba1ef53525",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": true,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "sh"
            ],
            "ArgsEscaped": true,
            "Image": "busybox",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f4e074987f4e3a6dd5fed6eafb7a92e8a01f9a7a1c041a59e180e787565ec5a3",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/f4e074987f4e",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "d57371cd50ffc8bd82dc73440f0d5c6f9ed998730ce18c840e4f10750332bece",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]
govint commented 6 years ago

@danielbjornadal. was able to run this test with docker CE 18.03.0 and unable to repro the issue. But from your logs looks like the plugin is able to report the status of the volume accurately (attached/detached). Suggest opening an issue on Docker as this doesn't seem to be a plugin issue.

danielbjornadal commented 6 years ago

@govint. We upgraded to 18.03.0-ce, and the issue still persist. We have tried on both debian and ubuntu-xenial.

# docker version
Client:
 Version:       18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:10:06 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:        Wed Mar 21 23:08:35 2018
  OS/Arch:      linux/amd64
  Experimental: false
# docker exec -it voltest1 bash
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown

Is there any debugging we could try on the ESXi host?

govint commented 6 years ago

@danielbjornadal, is this the problem,

docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b8c51d612bd7 busybox "sh" 5 seconds ago Up 4 seconds voltest1 4f55637b76c2 busybox "sh" 18 hours ago Up 18 hours voltest

docker exec -it b8c51d612bd7 bash OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown

docker exec -it b8c51d612bd7 /bin/bash OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory": unknown root@sc-rdops-vm02-dhcp-52-237:~# docker exec -it b8c51d612bd7 /bin/sh

/ # ps PID USER TIME COMMAND 1 root 0:00 sh 13 root 0:00 /bin/sh 18 root 0:00 ps / #

bash doesn't exist in the busybox image

danielbjornadal commented 6 years ago

@govint My bad. I was so focused on the problem, I did not check that busybox had bash.

The problem started with using this plugin through Rancher. If you install the plugin on the Rancher Agent host, and try to start a container using the vpshere volume plugin (using ubuntu:xenial this time :), the container is stuck in a starting state.

image

image

If I then try to to bash into the container from the host, it works. Don't know if this is a plugin or Rancher problem, but it looks like that Rancher can't see the container status.

# docker exec -it voltest3 bash
root@aab205625af5:

Rancher Server: v1.6.17-rc3 Rancher Agent: v1.2.10

govint commented 6 years ago

@danielbjornadal ok, without Rancher this problem doesn;t seem to be an issue. Could you check with Rancher on this issue first. If on Rancher the plugin shows the proper status attached/detached at least we know the plugin is mounting/unmounting the volume whenever it's requested to do so. I don't believe this is a plugin issue.

danielbjornadal commented 6 years ago

Rancher issue: https://github.com/rancher/rancher/issues/12863 Plugin attach and detach the volume as it should.