Open sflxn opened 6 years ago
I think the severity of this has been understated:
$ docker create -it alpine
e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e72aece618d2 alpine "/bin/sh" 21 hours ago Created lucid_hoover
$ govc vm.power -k -on lucid_hoover*
Powering on VirtualMachine:1584... OK
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e72aece618d2 alpine "/bin/sh" 21 hours ago Running lucid_hoover
$ docker attach e7
You cannot attach to a stopped container, start it first
$ docker kill e7
Error response from daemon: Cannot kill container e7: e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403 is not running
$ docker inspect e7 | head -n 10
[
{
"Id": "e72aece618d2d34eaa3e72a71124fb382604b5eb40ee0f77831b0f78bb022403",
"Created": "2017-11-01T17:07:38Z",
"Path": "/bin/sh",
"Args": [],
"State": {
"Status": "running",
"Running": false,
"Paused": false,
Note the inconsistent state in inspect output between Status
and Running
.
While investigating, noticed that there's an inverted test - this will panic if state is nil, err should always be checked first and ideally check the actual condition you care about, i.e. state != nil
:
if state, err := c.State(vc); !state.Running && err == nil {
return fmt.Errorf("%s is not running", vc.ContainerID)
}
https://github.com/vmware/vic/blob/master/lib/apiservers/engine/backends/container_proxy.go#L962
This is almost certainly due to StartTime for the session being zero, meaning we're not setting it in the tether when we launch the process or we are not launching the process. I do not recall a reason why this should be set only by the portlayer.
If it's a bridge network the container is not fully configured, given that an IP address is assigned by docker start
. In the HA scenario that's not an issue because the container will reuse the already assigned IP, however if starting from cold then there's no address configured.
If there's no IP assigned for a network interface configured as Static the container should fail to initialize and stop but evidently this is not occurring.
Requires the following tests:
docker start
test should fail (IP unassigned by power off event)The following addressed the observed issues but I'm not familiar enough with the network scope management to know if this would have cascade consequences so no PR:
$ git diff --patch
diff --git a/lib/portlayer/network/context.go b/lib/portlayer/network/context.go
index 77784ea..c61d747 100644
--- a/lib/portlayer/network/context.go
+++ b/lib/portlayer/network/context.go
@@ -696,7 +696,7 @@ func (c *Context) bindContainer(h *exec.Handle) ([]*Endpoint, error) {
}()
var eip *net.IP
- if ne.Static {
+ if ne.Static && ne.IP != nil {
eip = &ne.IP.IP
} else if !ip.IsUnspecifiedIP(ne.Assigned.IP) {
// for VCH restart, we need to reserve
@@ -730,7 +730,7 @@ func (c *Context) bindContainer(h *exec.Handle) ([]*Endpoint, error) {
}
if !ip.IsUnspecifiedIP(e.IP()) {
- ne.IP = &net.IPNet{
+ ne.Assigned = net.IPNet{
IP: e.IP(),
Mask: e.Scope().Subnet().Mask,
}
@@ -1184,7 +1184,11 @@ func (c *Context) AddContainer(h *exec.Handle, options *AddContainerOptions) err
ne.Network.Pools[i] = *p
}
- ne.Static = false
+ // no DHCP on bridge networks
+ if s.Type() == constants.BridgeScopeType {
+ ne.Static = true
+ }
+
if len(options.IP) > 0 && !ip.IsUnspecifiedIP(options.IP) {
ne.Static = true
ne.IP = &net.IPNet{
diff --git a/lib/tether/ops_linux.go b/lib/tether/ops_linux.go
index e6791a5..caaafc0 100644
--- a/lib/tether/ops_linux.go
+++ b/lib/tether/ops_linux.go
@@ -616,10 +616,19 @@ func ApplyEndpoint(nl Netlink, t *BaseOperations, endpoint *NetworkEndpoint) err
newIP = &endpoint.DHCP.Assigned
} else {
newIP = endpoint.IP
- if newIP.IP.Equal(net.IPv4zero) {
+ if newIP != nil && newIP.IP.Equal(net.IPv4zero) {
// managed externally
return nil
}
+
+ // if it's not got an explicit static address, but isn't using dhcp client (e.g. bridge network)
+ if newIP == nil || ip.IsUnspecifiedIP(newIP.IP) {
+ newIP = &endpoint.Assigned
+ }
+
+ if ip.IsUnspecifiedIP(newIP.IP) {
+ return fmt.Errorf("no IP address assigned for static interface: %s", endpoint.ID)
+ }
}
var old *net.IPNet
diff --git a/lib/tether/tether.go b/lib/tether/tether.go
index 22d4d05..56e1bff 100644
--- a/lib/tether/tether.go
+++ b/lib/tether/tether.go
@@ -857,6 +857,8 @@ func (t *tether) launch(session *SessionConfig) error {
// Set the Started key to "true" - this indicates a successful launch
session.Started = "true"
+ // Set the start time - this may have been set previously OOB but this is definitive so just overwrite
+ session.StartTime = time.Now().UTC().Unix()
// Write the PID to the associated PID file
cmdname := path.Base(session.Cmd.Path)
I have pushed my local changes to address this issue to https://github.com/hickeng/vic/tree/6558
This does not include robot tests or I'd have made a PR. I'm not sure of the current approach for using govc
out-of-band to manipulate cVMs given we're trying to make the tests function in a shared environment.
Should be fine, we already have tests that do close to this, just change stop to kill: https://github.com/vmware/vic/blob/master/tests/test-cases/Group1-Docker-Commands/1-07-Docker-Stop.robot#L132
As a user of VIC, I should be able to kill a container that I powered on out of band (not using docker CLI).
Acceptance Criteria:
OOB containers can be stopped but not killed. When attempting to kill the container, it reports the container is not running.