vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
639 stars 173 forks source link

Container VM powered on OOB cannot be killed #6558

Open sflxn opened 6 years ago

sflxn commented 6 years ago

As a user of VIC, I should be able to kill a container that I powered on out of band (not using docker CLI).

Acceptance Criteria:

  1. Create container
  2. User vSphere UI to power on the VM
  3. Docker kill the container

OOB containers can be stopped but not killed. When attempting to kill the container, it reports the container is not running.

hickeng commented 6 years ago

I think the severity of this has been understated:

$ docker create -it alpine
e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                  NAMES
e72aece618d2        alpine              "/bin/sh"           21 hours ago        Created                                    lucid_hoover
$ govc vm.power -k -on lucid_hoover*
Powering on VirtualMachine:1584... OK
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                  NAMES
e72aece618d2        alpine              "/bin/sh"           21 hours ago        Running                                    lucid_hoover
$ docker attach e7
You cannot attach to a stopped container, start it first
$ docker kill e7
Error response from daemon: Cannot kill container e7: e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403 is not running
$ docker inspect e7 | head -n 10
[
    {
        "Id": "e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403",
        "Created": "2017-11-01T17:07:38Z",
        "Path": "/bin/sh",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": false,
            "Paused": false,

Note the inconsistent state in inspect output between Status and Running.

While investigating, noticed that there's an inverted test - this will panic if state is nil, err should always be checked first and ideally check the actual condition you care about, i.e. state != nil:

    if state, err := c.State(vc); !state.Running && err == nil {
        return fmt.Errorf("%s is not running", vc.ContainerID)
    }

https://github.com/vmware/vic/blob/master/lib/apiservers/engine/backends/container_proxy.go#L962

This is almost certainly due to StartTime for the session being zero, meaning we're not setting it in the tether when we launch the process or we are not launching the process. I do not recall a reason why this should be set only by the portlayer.

If it's a bridge network the container is not fully configured, given that an IP address is assigned by docker start. In the HA scenario that's not an issue because the container will reuse the already assigned IP, however if starting from cold then there's no address configured. If there's no IP assigned for a network interface configured as Static the container should fail to initialize and stop but evidently this is not occurring.

Requires the following tests:

The following addressed the observed issues but I'm not familiar enough with the network scope management to know if this would have cascade consequences so no PR:

$ git diff --patch
diff --git a/lib/portlayer/network/context.go b/lib/portlayer/network/context.go
index 77784ea..c61d747 100644
--- a/lib/portlayer/network/context.go
+++ b/lib/portlayer/network/context.go
@@ -696,7 +696,7 @@ func (c *Context) bindContainer(h *exec.Handle) ([]*Endpoint, error) {
                }()

                var eip *net.IP
-               if ne.Static {
+               if ne.Static && ne.IP != nil {
                        eip = &ne.IP.IP
                } else if !ip.IsUnspecifiedIP(ne.Assigned.IP) {
                        // for VCH restart, we need to reserve
@@ -730,7 +730,7 @@ func (c *Context) bindContainer(h *exec.Handle) ([]*Endpoint, error) {
                }

                if !ip.IsUnspecifiedIP(e.IP()) {
-                       ne.IP = &net.IPNet{
+                       ne.Assigned = net.IPNet{
                                IP:   e.IP(),
                                Mask: e.Scope().Subnet().Mask,
                        }
@@ -1184,7 +1184,11 @@ func (c *Context) AddContainer(h *exec.Handle, options *AddContainerOptions) err
                ne.Network.Pools[i] = *p
        }

-       ne.Static = false
+       // no DHCP on bridge networks
+       if s.Type() == constants.BridgeScopeType {
+               ne.Static = true
+       }
+
        if len(options.IP) > 0 && !ip.IsUnspecifiedIP(options.IP) {
                ne.Static = true
                ne.IP = &net.IPNet{
diff --git a/lib/tether/ops_linux.go b/lib/tether/ops_linux.go
index e6791a5..caaafc0 100644
--- a/lib/tether/ops_linux.go
+++ b/lib/tether/ops_linux.go
@@ -616,10 +616,19 @@ func ApplyEndpoint(nl Netlink, t *BaseOperations, endpoint *NetworkEndpoint) err
                newIP = &endpoint.DHCP.Assigned
        } else {
                newIP = endpoint.IP
-               if newIP.IP.Equal(net.IPv4zero) {
+               if newIP != nil && newIP.IP.Equal(net.IPv4zero) {
                        // managed externally
                        return nil
                }
+
+               // if it's not got an explicit static address, but isn't using dhcp client (e.g. bridge network)
+               if newIP == nil || ip.IsUnspecifiedIP(newIP.IP) {
+                       newIP = &endpoint.Assigned
+               }
+
+               if ip.IsUnspecifiedIP(newIP.IP) {
+                       return fmt.Errorf("no IP address assigned for static interface: %s", endpoint.ID)
+               }
        }

        var old *net.IPNet
diff --git a/lib/tether/tether.go b/lib/tether/tether.go
index 22d4d05..56e1bff 100644
--- a/lib/tether/tether.go
+++ b/lib/tether/tether.go
@@ -857,6 +857,8 @@ func (t *tether) launch(session *SessionConfig) error {

        // Set the Started key to "true" - this indicates a successful launch
        session.Started = "true"
+       // Set the start time - this may have been set previously OOB but this is definitive so just overwrite
+       session.StartTime = time.Now().UTC().Unix()

        // Write the PID to the associated PID file
        cmdname := path.Base(session.Cmd.Path)
hickeng commented 6 years ago

I have pushed my local changes to address this issue to https://github.com/hickeng/vic/tree/6558 This does not include robot tests or I'd have made a PR. I'm not sure of the current approach for using govc out-of-band to manipulate cVMs given we're trying to make the tests function in a shared environment.

mhagen-vmware commented 6 years ago

Should be fine, we already have tests that do close to this, just change stop to kill: https://github.com/vmware/vic/blob/master/tests/test-cases/Group1-Docker-Commands/1-07-Docker-Stop.robot#L132