Regression: VIC power-off failure on stop / kill for unresponsive containers

corrieb commented 7 years ago

User Statement:

In the situation where a container process is not responding to signals, a stop or a kill should cause the container to be powered off.

Details:

Note: This worked in 1.1.1, but does not work in 1.2 rc3. Note: This appears to only be a problem on ESXi - at least as far as I can reproduce it.

Steps to reproduce: Build a very simple container that will have an unresponsive process:

$ cat Dockerfile
FROM debian
COPY run.sh /
CMD /run.sh
$ cat run.sh
#!/bin/bash
sleep infinity

The above will be unresponsive because the script will not pass signals to the process. There are plenty of other reasons though why a process may become unresponsive.

Then run the container as a daemon and try to stop it:

$ docker run --name test -d bensdoings/test
$ docker stop test
Error response from daemon: Server error from portlayer: The attempted operation cannot be performed in the current state (Powered on).
$ docker kill test
Error response from daemon: Cannot kill container ca1: Server error from portlayer: The attempted operation cannot be performed in the current state (Powered on).
$ docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
ca1f0b049979        bensdoings/test     "/bin/sh -c /run.sh"   2 minutes ago       Up 34 seconds                           silly_swanson

The ESXi client reports the issue, but powering off via the UI works just fine. The problem with this is that it becomes impossible to clean up the container from the docker CLI.

corrieb commented 7 years ago

Making this high priority since it is a clear regression. Just because I have not seen it fail on VC, doesn't mean it's not possible. Can reduce priority once more triage is done if necessary.

mdubya66 commented 7 years ago

i traced this failure back to b12176, all failures in the same way. b12146 passed.

mdubya66 commented 7 years ago

Build https://ci.vcna.io/vmware/vic/12176 was for PR https://github.com/vmware/vic/pull/5532 which fixed issue https://github.com/vmware/vic/issues/5445

mdubya66 commented 7 years ago

No Repro in this https://storage.cloud.google.com/vic-engine-builds/vic_12146.tar.gz Repro in this https://storage.cloud.google.com/vic-engine-builds/vic_12176.tar.gz

mdubya66 commented 7 years ago

Interesting, running kill before stop worked

(mwilliamson) ib-aus-office-88:~/Downloads/vic_1.2.0rc1 $ docker -H 172.16.127.245:2376 --tls kill test
test
(mwilliamson) ib-aus-office-88:~/Downloads/vic_1.2.0rc1 $ docker -H 172.16.127.245:2376 --tls ps -a
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
b0cc18db8763        bensdoings/test     "/bin/sh -c /run.sh"   59 seconds ago      Stopped                                 test
(mwilliamson) ib-aus-office-88:~/Downloads/vic_1.2.0rc1 $

anchal-agrawal commented 7 years ago

Portlayer logs:

Sep  1 2017 20:16:12.342Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*Container).stop:419] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3
Sep  1 2017 20:16:12.342Z DEBUG Setting container b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 state: Stopping
Sep  1 2017 20:16:12.342Z INFO  sending kill -TERM b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3
Sep  1 2017 20:16:12.342Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:181] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3:kill
Sep  1 2017 20:16:12.543Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:181] [200.658513ms] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3:kill
Sep  1 2017 20:16:12.543Z INFO  waiting 10s for b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 to power off
Sep  1 2017 20:16:12.550Z DEBUG vSphere Event Guest operation Start Program performed on Virtual machine serene_bohr-b56021221959. for eventID(3738) ignored by the event collector
Sep  1 2017 20:16:22.347Z WARN  timeout (10s) waiting for b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 to power off via SIGTERM
Sep  1 2017 20:16:22.347Z INFO  sending kill -KILL b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3
Sep  1 2017 20:16:22.347Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:181] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3:kill
Sep  1 2017 20:16:32.348Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:181] [10.000316765s] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3:kill
Sep  1 2017 20:16:32.348Z WARN  sending kill -KILL b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3: Post https://10.160.233.140/sdk: context deadline exceeded
Sep  1 2017 20:16:32.348Z INFO  waiting 10s for b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 to power off
Sep  1 2017 20:16:32.348Z WARN  timeout (10s) waiting for b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 to power off via SIGKILL
Sep  1 2017 20:16:32.348Z WARN  stopping b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 via hard power off due to: failed to shutdown b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 via kill signals [TERM KILL]
Sep  1 2017 20:16:32.348Z DEBUG op=300.43 (delta:4.694µs): [NewOperation] op=300.43 (delta:2.539µs) [github.com/vmware/vic/pkg/vsphere/tasks.WaitForResult:68]
Sep  1 2017 20:16:32.468Z DEBUG vSphere Event serene_bohr-b56021221959 on  sc-rdops-vm08-dhcp-233-140.eng.vmware.com in ha-datacenter is stopping for eventID(3739) ignored by the event collector
Sep  1 2017 20:16:32.495Z ERROR op=300.43 (delta:147.319499ms): unexpected fault on task retry: &types.InvalidPowerState{InvalidState:types.InvalidState{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, RequestedState:"poweredOn", ExistingState:"poweredOn"}
Sep  1 2017 20:16:32.512Z WARN  invalid power state during power off: poweredOn
Sep  1 2017 20:16:32.512Z DEBUG Set container b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3 state: Running
Sep  1 2017 20:16:32.512Z DEBUG Container(b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3) closing 0 log followers
Sep  1 2017 20:16:32.512Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*Container).stop:419] [20.170182858s] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3
Sep  1 2017 20:16:32.512Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] [20.170207746s] b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3
Sep  1 2017 20:16:32.512Z ERROR CommitHandler error on handle(2888b8365e17b5af6d07f3226c2560fd) for b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3: task.Error{LocalizedMethodFault:(*types.LocalizedMethodFault)(0xc420c30dc0)}
Sep  1 2017 20:16:32.512Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*ContainersHandlersImpl).CommitHandler:218] [20.171235067s] handle(2888b8365e17b5af6d07f3226c2560fd)

This message looks concerning since RequestedState:"poweredOn":

Sep  1 2017 20:16:32.495Z ERROR op=300.43 (delta:147.319499ms): unexpected fault on task retry: &types.InvalidPowerState{InvalidState:types.InvalidState{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, RequestedState:"poweredOn", ExistingState:"poweredOn"}

hickeng commented 7 years ago

@anchal-agrawal We should add that container definition to our testing for container lifecycle operations - looks like it's forcing a path we rarely hit otherwise.

anchal-agrawal commented 7 years ago

hostd:

2017-09-01T20:16:23.270Z verbose hostd[88C1B70] [Originator@6876 sub=PropertyProvider opID=aa17a9c3 user=root] RecordOp ASSIGN: info, haTask--vim.vm.guest.ProcessManager.startProgram-111839286. Applied change to temp map.
2017-09-01T20:16:23.335Z info hostd[88C1B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/598c7948-ecca159b-869b-02001079965b/b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3/b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3.vmx opID=aa17a9c3 user=root] State Transition (VM_STATE_ON -> VM_STATE_GUEST_OPERATION)
2017-09-01T20:16:23.335Z verbose hostd[88C1B70] [Originator@6876 sub=PropertyProvider opID=aa17a9c3 user=root] RecordOp ASSIGN: disabledMethod, 68. Sent notification immediately.
...
2017-09-01T20:16:33.362Z info hostd[701D9D0] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=aa17a9c4 user=root] Event 3739 : serene_bohr-b56021221959 on  sc-rdops-vm08-dhcp-233-140.eng.vmware.com in ha-datacenter is stopping
2017-09-01T20:16:33.362Z error hostd[701D9D0] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/598c7948-ecca159b-869b-02001079965b/b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3/b560212219597e651d8c3aa075231b81a90951879cd534bed38b5c6ce9ee6da3.vmx opID=aa17a9c4 user=root] Invalid transition requested (VM_STATE_GUEST_OPERATION -> VM_STATE_POWERING_OFF): Invalid state
2017-09-01T20:16:33.362Z info hostd[701D9D0] [Originator@6876 sub=Default opID=aa17a9c4 user=root] AdapterServer caught exception: vim.fault.InvalidPowerState
2017-09-01T20:16:33.362Z info hostd[701D9D0] [Originator@6876 sub=Vimsvc.TaskManager opID=aa17a9c4 user=root] Task Completed : haTask-68-vim.VirtualMachine.powerOff-111839287 Status error

container-logs.zip

anchal-agrawal commented 7 years ago

hostd logs from build 12146 (last known build without this issue) during a docker stop. I've added notes below each snippet.

2017-09-05T15:44:22.783Z verbose hostd[C844B70] [Originator@6876 sub=PropertyProvider opID=aa187164 user=root] RecordOp ASSIGN: info, haTask--vim.vm.guest.ProcessManager.startProgram-111877484. Applied change to temp map.
2017-09-05T15:44:22.784Z info hostd[C844B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/598c7948-ecca159b-869b-02001079965b/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e.vmx opID=aa187164 user=root] State Transition (VM_STATE_ON -> VM_STATE_GUEST_OPERATION)

State has changed from VM_STATE_ON -> VM_STATE_GUEST_OPERATION for a guest program task.

2017-09-05T15:45:22.787Z info hostd[D5F3B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 4392 : Guest operation Start Program performed on Virtual machine inspiring_archimedes-751467b7a2e2.
2017-09-05T15:45:22.788Z info hostd[D5F3B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/598c7948-ecca159b-869b-02001079965b/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e.vmx] State Transition (VM_STATE_GUEST_OPERATION -> VM_STATE_ON)

State has changed from VM_STATE_GUEST_OPERATION -> VM_STATE_ON after 1 minute.

2017-09-05T15:45:22.789Z info hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Activation [N5Vmomi10ActivationE:0x0c56c748] : Invoke done [startProgram] on [vim.vm.guest.ProcessManager:ha-guest-operations-process-manager]
2017-09-05T15:45:22.789Z verbose hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Arg vm:
--> 'vim.VirtualMachine:80'
2017-09-05T15:45:22.789Z verbose hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Arg auth:
--> (vim.vm.guest.NamePasswordAuthentication) {
-->    interactiveSession = false,
-->    username = "751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e",
-->    password = (not shown)
--> }
2017-09-05T15:45:22.789Z verbose hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Arg spec:
--> (vim.vm.guest.ProcessManager.ProgramSpec) {
-->    programPath = "kill",
-->    arguments = "KILL",
-->    workingDirectory = <unset>,
-->    envVariables = <unset>
--> }
2017-09-05T15:45:22.789Z info hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Throw vmodl.fault.SystemError
2017-09-05T15:45:22.789Z info hostd[D5F3B70] [Originator@6876 sub=Solo.Vmomi] Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "vix error codes = (3016, 0).
--> "
-->    msg = ""
--> }

Results of the kill -KILL task.

2017-09-05T15:45:22.797Z info hostd[8943B70] [Originator@6876 sub=Vimsvc.TaskManager opID=aa187171 user=root] Task Created : haTask-80-vim.VirtualMachine.powerOff-111877489

Power off task created.

2017-09-05T15:45:22.797Z info hostd[C781B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/598c7948-ecca159b-869b-02001079965b/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e/751467b7a2e2aa5384fa315b4b72bd77814f6b81476f607871091ee42d35213e.vmx opID=aa187171 user=root] State Transition (VM_STATE_ON -> VM_STATE_POWERING_OFF)

Valid state transition from VM_STATE_ON -> VM_STATE_POWERING_OFF. docker stop succeeds.

Based on this, the original issue is likely happening because #5532 applied a 10 second context timeout to the startGuestProgram func: https://github.com/vmware/vic/pull/5532/files#diff-67f57bbb6bec9888ba685e947a8a00cfR303. This can be seen in the logs in https://github.com/vmware/vic/issues/6199#issuecomment-326691369 - there's a 10 second interval between the kill -KILL and the power off tasks, causing the invalid VM_STATE_GUEST_OPERATION -> VM_STATE_POWERING_OFF transition. The unknown here is why the kill -KILL task takes 1 minute to transition back from VM_STATE_GUEST_OPERATION to VM_STATE_ON (pinging @sflxn @dougm @hickeng).

anchal-agrawal commented 7 years ago

@corrieb I think that while the failure scenario here is easy to come up with, it is developer-focused and production containers/images would likely not encounter it often. Nevertheless, I agree that this should be a high-priority fix.

sflxn commented 7 years ago

I agree with the assessment that you guys have come up with. The timeouts are causing the problem. I increased the timeout and no longer see the issue.

When we originally did stop, we noticed docker waited 10 secs for sig term and after that performed sig kill. 10s is arbitrary. There was no comment in their code why 10 sec was the magic number. Also, there was no timeout for sig kill. The PR that added the respect for the timeout now forced sig kill to also be limited to 10s. If we increase the timeout, the issue goes away. I'm not saying it's the right thing to do as @hickeng did more in his PR and not sure if the timeout is integral to the other issues he was addressing.

One last comment, Docker used 10sec for sig term. Since we have VMs, we can always use a longer timeout if the issue demands it.

sflxn commented 7 years ago

I tried to reproduce this with other containers, e.g. busybox (/bin/top), mysql, and nginx. They all stopped promptly. I think this may be an important issue but maybe not high priority if we can't find another container other than this one that doesn't stop.

chengwang86 commented 7 years ago

I tested it using the latest master branch on a vc cluster (vsphere 6.5); here is what I observed:

both docker stop (1m11s) and docker kill (10s) worked.
For docker stop, the hard poweroff (exec/base.go:poweroff()) took 50s.
docker ps -a showed the container's status being stopped after docker kill. This is because both the stoptime and exit status were 0 according to the container's tether.debug log

Same test on a local ESXi host using the latest master branch,

Both docker kill and docker stop failed
docker kill followed by docker stop worked

Same test on a local ESXi host using vic-1.1.1 binaries

Normal behavior
docker stop took 1m11s (sending SIGKILL via startGuestProgram() took 1m)

Concurr with the previous analysis, enforcing the 10s timeout for startGuestProgram() when sending SIGKILL may have left the container in the state VM_STATE_GUEST_OPERATION and the subsequent poweroff() operation would fail due to an invalid state transition (hostd.log: Invalid transition requested (VM_STATE_GUEST_OPERATION -> VM_STATE_POWERING_OFF): Invalid state)

So far there are three possible fixes in my mind:

remove the 10s timeout for startGuestProgram() (then we won't have the invalid state transition)
when startGuestProgram() returns with context deadline exceeded (timeout), we should be able to reset the VM_STATE to its previous state, instead of leaving it in VM_STATE_GUEST_OPERATION.
retry hard poweroff when we get InvalidPowerState. After the vm state transitions from VM_STATE_GUEST_OPERATION to VM_STATE_ON, the subsequent hard poweroff should succeed.

chengwang86 commented 7 years ago

According to @dougm , the container will only be in VM_STATE_GUEST_OPERATION during the call to startProgram(). This PR https://github.com/vmware/vic/pull/6331 makes the toolbox StartCommand() async, which may solve this issue.

chengwang86 commented 7 years ago

PR #5532 sets the timeout of both SIGTERM and SIGKILL to 10s in order to achieve the effect of docker stop -t: Seconds to wait for stop before killing it (default 10) . @hickeng could you please confirm or correct me if I'm wrong?

However, we don't really need to set the timeout for SIGKILL. Removing the timeout for SIGKILL in base.go: shutdown() and base.go: kill() should help resolve this issue.

hickeng commented 7 years ago

@chengwang86 correct - we were seeing hangs waiting for guest operation without that timeout in place. The 10s is the default value to the -t parameter of docker stop and can be adjusted - setting -t 2 should result in rapid shutdown and this timeout was necessary to honor that option. We now have proper async handling in the toolbox so the VM_STATE_GUEST_OPERATION state should be very brief - if it's not we should actually find out why. VM_STATE_GUEST_OPERATION should also transition back to VM_STATE_ON even if the guest operation in the portlayer has been cancelled so will not block indefinitely irrespective of the startGuestProgram timeout.

This collision is essentially "task-in-progress" - I suggest:

add the VM_STATE_GUEST_OPERATION detail for InvalidPowerState as one of the events that triggers a retry.
find out why VM_STATE_GUEST_OPERATION is persisting for as long as it is - hopefully not another hostsync related item.

chengwang86 commented 7 years ago

@hickeng @dougm The async toolbox StartCommand (PR #6331) does not make a difference in the resulted VM_STATE.

Without the PR #6331, here is the end of tether.debug of a container wherein docker stop failed (same behavior using vic-1.2 and vic-1.1.1 where the cmd didn't fail):

Sep 14 2017 21:52:07.928Z DEBUG [BEGIN] [github.com/vmware/vic/lib/tether.(*Toolbox).containerStartCommand:226] TERM
Sep 14 2017 21:52:07.929Z DEBUG [BEGIN] [github.com/vmware/vic/lib/tether.(*Toolbox).kill:165] TERM
Sep 14 2017 21:52:07.929Z INFO  sending signal TERM (15) to 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a
Sep 14 2017 21:52:07.930Z DEBUG [ END ] [github.com/vmware/vic/lib/tether.(*Toolbox).kill:165] [1.260126ms] TERM
Sep 14 2017 21:52:07.930Z DEBUG [ END ] [github.com/vmware/vic/lib/tether.(*Toolbox).containerStartCommand:226] [1.864447ms] TERM
Sep 14 2017 21:52:07.964Z DEBUG Inspecting children with status change
Sep 14 2017 21:52:07.966Z DEBUG Reaped process 293, return code: -1
Sep 14 2017 21:52:07.967Z DEBUG Removed child pid: 293
Sep 14 2017 21:52:07.968Z DEBUG [BEGIN] [github.com/vmware/vic/lib/tether.(*tether).handleSessionExit:685] handling exit of session 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a
Sep 14 2017 21:52:07.968Z DEBUG Waiting on session.wait

PL (same behavior w/ or w/o the async toolbox startCommand):

Sep 14 2017 21:52:08.134Z INFO  sending kill -TERM 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a
Sep 14 2017 21:52:08.134Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a:kill
Sep 14 2017 21:52:08.299Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] [165.340516ms] 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a:kill
... ...
Sep 14 2017 21:52:18.150Z WARN  timeout (10s) waiting for 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a to power off via SIGTERM
Sep 14 2017 21:52:18.150Z INFO  sending kill -KILL 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a
Sep 14 2017 21:52:18.150Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a:kill
Sep 14 2017 21:52:28.151Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] [10.000953661s] 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a:kill
Sep 14 2017 21:52:28.151Z WARN  sending kill -KILL 71bf67932e1d8d06c663ce4a13165f64b73bfcc8cf31723bf609af6d74284e3a: Post https://192.168.213.129/sdk: context deadline exceeded
... ...
Sep 14 2017 21:52:28.344Z ERROR op=319.74 (delta:192.437524ms): unexpected fault on task retry: &types.InvalidPowerState{InvalidState:types.InvalidState{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, RequestedState:"poweredOn", ExistingState:"poweredOn"}
Sep 14 2017 21:52:28.381Z WARN  invalid power state during power off: poweredOn

My observation/understanding (after adding the code changes for async toolbox StartCommand):

sending kill -TERM via startGuestProgram() returned promptly as expected (no timeout error); after it returned, the container status became VM_STATE_ON
sending kill -KILL was canceled due to request timeout, leaving the container in the state of VM_STATE_GUEST_OPERATION. Not sure if SIGKILL was handled by the toolbox properly since I didn't find relevant logs in tether.debug (although TERM showed up as shown above).
The context timeout (10s) we set for startGuestProgram() is actually only used as the timeout for the http request . Therefore, when the context was canceled due to timeout, the http request for startProgramInGuest returned with timeout error, whereas vsphere had not finished the guest operation yet.
As George suggested, we could retry the hard poweroff in this case. As a side note, the VM was in the power state of poweredOn according to the detailed msg of the InvalidPowerState error (though it was marked as VM_STATE_GUEST_OPERATOIN in the hostd log). At our side (PL), we wouldn't know anything about VM_STATE_GUEST_OPERATION.

chengwang86 commented 7 years ago

Update:

With vic-1.2, docker kill still works on this container.

Sep 15 2017 16:28:50.269Z INFO  sending kill -KILL 4e55341ac718286a6cd593c4cafbd10299c0144d6a64263fef2569900dc78859
Sep 15 2017 16:28:50.269Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] 4e55341ac718286a6cd593c4cafbd10299c0144d6a64263fef2569900dc78859:kill
Sep 15 2017 16:28:50.542Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).startGuestProgram:182] [272.897018ms] 4e55341ac718286a6cd593c4cafbd10299c0144d6a64263fef2569900dc78859:kill
Sep 15 2017 16:28:50.542Z INFO  waiting 10s for 4e55341ac718286a6cd593c4cafbd10299c0144d6a64263fef2569900dc78859 to power off
Sep 15 2017 16:28:50.562Z DEBUG vSphere Event Guest operation Start Program performed on Virtual machine test-4e55341ac718. for eventID(37925) ignored by the event collector
Sep 15 2017 16:29:00.276Z WARN  killing 4e55341ac718286a6cd593c4cafbd10299c0144d6a64263fef2569900dc78859 via hard power off
Sep 15 2017 16:29:00.276Z DEBUG op=299.23 (delta:7.605µs): [NewOperation] op=299.23 (delta:4.253µs) [github.com/vmware/vic/pkg/vsphere/tasks.WaitForResult:68]
Sep 15 2017 16:29:00.334Z DEBUG vSphere Event test-4e55341ac718 on  localhost.localdomain in ha-datacenter is stopping for eventID(37926) ignored by the event collector
Sep 15 2017 16:29:00.625Z DEBUG Found 2 subscribers to id: 37927 - vsphere.VMEvent: test-4e55341ac718 on  localhost.localdomain in ha-datacenter is powered off

chengwang86 commented 7 years ago

Based on the tether.debug log posted here , I suspect that the tether process hangs at session.wait.Wati() in [lib/tether/tether.go: handleSessionExit()](https://github.com/vmware/vic/blob/master/lib/tether/tether.go#L688), since the last line in the tether log is Waiting on session.wait. Probably because the container runs sleep infinity?

Therefore, we only observe sending signal TERM in tether.debug but not sending signal KILL. Maybe this is the reason why sending kill -KILL via StartGuestProgram() in the PL hangs ...

@hickeng @dougm Any thoughts?

chengwang86 commented 7 years ago

Update: @hickeng @caglar10ur

Based on discussion with @emlin , I suspect that there might be something wrong with this function tether_linux.go: childReaper() (probably it hits an edge case in this issue): https://github.com/vmware/vic/blob/master/lib/tether/tether_linux.go#L67

                                       pid, err := syscall.Wait4(-1, &status, flag, nil)
                    // pid 0 means no processes wish to report status
                    if pid == 0 || err == syscall.ECHILD {
                        log.Debug("No more child processes to reap")
                        break
                    }
                                       ... ...
                    if !status.Exited() && !status.Signaled() {
                        log.Debugf("Received notifcation about non-exit status change for %d: %d", pid, status)
                        // no reaping or exit handling required
                        continue
                    }

                    exitCode := status.ExitStatus()
                    log.Debugf("Reaped process %d, return code: %d", pid, exitCode)

                    session, ok := t.removeChildPid(pid)
                    if ok {
                        log.Debugf("Removed child pid: %d", pid)
                        session.Lock()
                        session.ExitStatus = exitCode

                        t.handleSessionExit(session)  ****** (Cheng: this is where tether hangs: waiting on session.wait)
                        session.Unlock()
                        continue
                    }

I started a container docker run -d debian /bin/bash -c "echo sleep infinity > run.sh; chmod u+x run.sh; /run.sh" and then ran docker stop (failed). From tether log, I found that status.Exited() was true, status.Signaled() was false and the exitCode was -1.

For a normal container, it would be safe for us to say that the container process should exit after calling docker stop (SIGTERM), and tether could handleSessionExit(session).

However, this container was running sleep infinity; it blocked the SIGTERM signal and didn't exit. Therefore, in t.handleSessionExit(session), tether was waiting on the session and hanging for ever. Since we have session.lock() before t.handleSessionExit(), the toolbox was not able to grab the lock and send SIGKILL to kill the container process (thus it took PL startGuestProgram() a long time to finish when sending SIGKLL).

To fix:

Our current assumption is that if either status.Exited() or status.Signaled() is true, the container may have exited or been forced to exit via signals (TERM, KILL) and we can handleSessionExit. However, this is not true if the container is sleeping (SIGTERM blocked).
Fix #1: We may have to handle SIGTERM and SIGKILL separately, i.e., if the container process is sleeping and the signal is SIGTERM, don't do reaping or exit handling.
Fix #2: We can wait for the session to exit/finish before removeChildPid/handleSessionExit . If it does not finish within certain deadline, we assume that the container process doesn't exit and therefore exit handling is not needed.
PR #6350 is a workaround; it does not solve the real problem.

chengwang86 commented 7 years ago

Update:

I tried the above fix #2 in my local test and I found sending kill -KILL ... : ServerFaultCode: A general system error occurred: vix error codes = (1, 0). But this time a subsequent hard poweroff worked as expected.

caglar10ur commented 7 years ago

I think what you are after is https://github.com/golang/go/issues/19798

hickeng commented 7 years ago

Used the following to recreate: docker run -d --net=external alpine /bin/ash -c "/bin/ash -c 'sleep 20000'"

stack dump from pprof in the cVM after calling stop and confirming we'd entered handleSessionExit

``` goroutine profile: total 16 1 @ 0x40def2 0x4403f4 0x805612 0x4594d1 # 0x4403f3 os/signal.signal_recv+0x103 /usr/local/go/src/runtime/sigqueue.go:116 # 0x805611 os/signal.loop+0x21 /usr/local/go/src/os/signal/signal_unix.go:22 1 @ 0x42c1fa 0x427317 0x426959 0x4e7a68 0x4e7ad4 0x4eaa45 0x506dce 0x505039 0x7811af 0x77fef8 0x77fbe0 0x7808ef 0x4594d1 # 0x426958 net.runtime_pollWait+0x58 /usr/local/go/src/runtime/netpoll.go:164 # 0x4e7a67 net.(*pollDesc).wait+0x37 /usr/local/go/src/net/fd_poll_runtime.go:75 # 0x4e7ad3 net.(*pollDesc).waitRead+0x33 /usr/local/go/src/net/fd_poll_runtime.go:80 # 0x4eaa44 net.(*netFD).accept+0x1e4 /usr/local/go/src/net/fd_unix.go:430 # 0x506dcd net.(*TCPListener).accept+0x2d /usr/local/go/src/net/tcpsock_posix.go:136 # 0x505038 net.(*TCPListener).AcceptTCP+0x48 /usr/local/go/src/net/tcpsock.go:215 # 0x7811ae net/http.tcpKeepAliveListener.Accept+0x2e /usr/local/go/src/net/http/server.go:3044 # 0x77fef7 net/http.(*Server).Serve+0x227 /usr/local/go/src/net/http/server.go:2643 # 0x77fbdf net/http.(*Server).ListenAndServe+0xaf /usr/local/go/src/net/http/server.go:2585 # 0x7808ee net/http.ListenAndServe+0x7e /usr/local/go/src/net/http/server.go:2787 1 @ 0x42c1fa 0x42c2de 0x404931 0x4045a5 0x85d25a 0x8743f3 0x42bdaa 0x4594d1 # 0x85d259 github.com/vmware/vic/lib/tether.(*tether).Start+0x209 /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether.go:529 # 0x8743f2 main.main+0x5c2 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/main_linux.go:103 # 0x42bda9 runtime.main+0x209 /usr/local/go/src/runtime/proc.go:185 1 @ 0x42c1fa 0x42c2de 0x404931 0x4045a5 0x879437 0x4594d1 # 0x879436 main.startSignalHandler.func1+0x56 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/main_linux.go:129 1 @ 0x42c1fa 0x42c2de 0x43caf1 0x43c6f4 0x46d7aa 0x85e11a 0x868db6 0x869284 0x4594d1 # 0x43c6f3 sync.runtime_Semacquire+0x33 /usr/local/go/src/runtime/sema.go:47 # 0x46d7a9 sync.(*WaitGroup).Wait+0x79 /usr/local/go/src/sync/waitgroup.go:131 # 0x85e119 github.com/vmware/vic/lib/tether.(*tether).handleSessionExit+0x109 /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether.go:688 # 0x868db5 github.com/vmware/vic/lib/tether.(*tether).childReaper.func1.1+0x355 /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether_linux.go:144 # 0x869283 github.com/vmware/vic/lib/tether.(*tether).childReaper.func1+0xa3 /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether_linux.go:157 1 @ 0x42c1fa 0x42c2de 0x43caf1 0x43c7d4 0x46c38d 0x863670 0x86a74f 0x83d85a 0x838e79 0x846838 0x842c67 0x84547e 0x4594d1 # 0x43c7d3 sync.runtime_SemacquireMutex+0x33 /usr/local/go/src/runtime/sema.go:62 # 0x46c38c sync.(*Mutex).Lock+0x9c /usr/local/go/src/sync/mutex.go:87 # 0x86366f github.com/vmware/vic/lib/tether.(*Toolbox).containerAuthenticate+0xbf /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/toolbox.go:212 # 0x86a74e github.com/vmware/vic/lib/tether.(*Toolbox).(github.com/vmware/vic/lib/tether.containerAuthenticate)-fm+0x7e /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/toolbox.go:147 # 0x83d859 github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox.(*CommandServer).authenticate+0x2e9 /home/vagrant/vicsmb/src/github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox/command.go:689 # 0x838e78 github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox.(*CommandServer).Dispatch+0x398 /home/vagrant/vicsmb/src/github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox/command.go:158 # 0x846837 github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox.(*CommandServer).Dispatch-fm+0x47 /home/vagrant/vicsmb/src/github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox/command.go:81 # 0x842c66 github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox.(*Service).Dispatch+0x176 /home/vagrant/vicsmb/src/github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox/service.go:236 # 0x84547d github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox.(*Service).Start.func1+0x2cd /home/vagrant/vicsmb/src/github.com/vmware/vic/vendor/github.com/vmware/govmomi/toolbox/service.go:185 1 @ 0x42c1fa 0x43b604 0x43a26c 0x45609d 0x4594d1 # 0x42c1f9 runtime.gopark+0x139 /usr/local/go/src/runtime/proc.go:271 # 0x43b603 runtime.selectgoImpl+0x1363 /usr/local/go/src/runtime/select.go:423 # 0x43a26b runtime.selectgo+0x1b /usr/local/go/src/runtime/select.go:238 # 0x45609c runtime.ensureSigM.func1+0x2dc /usr/local/go/src/runtime/signal_unix.go:434 1 @ 0x42c1fa 0x43b604 0x43a26c 0x80782b 0x46dc29 0x805f58 0x8061d8 0x86f756 0x86fa93 0x870167 0x4594d1 # 0x80782a github.com/vmware/vic/pkg/serial.(*RawConn).Read+0x42a /home/vagrant/vicsmb/src/github.com/vmware/vic/pkg/serial/rawconn.go:129 # 0x46dc28 io.ReadAtLeast+0xa8 /usr/local/go/src/io/io.go:307 # 0x805f57 github.com/vmware/vic/pkg/serial.ReadAtLeastN+0x87 /home/vagrant/vicsmb/src/github.com/vmware/vic/pkg/serial/handshake.go:72 # 0x8061d7 github.com/vmware/vic/pkg/serial.HandshakeServer+0x187 /home/vagrant/vicsmb/src/github.com/vmware/vic/pkg/serial/handshake.go:164 # 0x86f755 main.backchannel+0x1f5 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/attach.go:315 # 0x86fa92 main.(*attachServerSSH).establish+0xe2 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/attach.go:363 # 0x870166 main.(*attachServerSSH).run+0x1b6 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/attach.go:407 1 @ 0x42c1fa 0x43b604 0x43a26c 0x854983 0x4594d1 # 0x854982 github.com/vmware/vic/lib/tether.(*BaseOperations).dhcpLoop+0xca2 /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/ops_linux.go:675 1 @ 0x42c1fa 0x43b604 0x43a26c 0x878b02 0x4594d1 # 0x878b01 main.backchannel.func1+0x101 /home/vagrant/vicsmb/src/github.com/vmware/vic/cmd/tether/attach.go:296 1 @ 0x4594d1 1 @ 0x47dae5 0x47a1d5 0x4775a9 0x49758f 0x4940b6 0x46e1e0 0x46e048 0x869b6d 0x4594d1 # 0x47dae4 syscall.Syscall+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:18 # 0x47a1d4 syscall.read+0x54 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 # 0x4775a8 syscall.Read+0x48 /usr/local/go/src/syscall/syscall_unix.go:162 # 0x49758e os.(*File).read+0x4e /usr/local/go/src/os/file_unix.go:165 # 0x4940b5 os.(*File).Read+0x75 /usr/local/go/src/os/file.go:101 # 0x46e1df io.copyBuffer+0xff /usr/local/go/src/io/io.go:390 # 0x46e047 io.CopyBuffer+0x87 /usr/local/go/src/io/io.go:371 # 0x869b6c github.com/vmware/vic/lib/tether.establishNonPty.func2+0x12c /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether_linux.go:322 1 @ 0x47dae5 0x47a1d5 0x4775a9 0x49758f 0x4940b6 0x46e1e0 0x46e048 0x869f4d 0x4594d1 # 0x47dae4 syscall.Syscall+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:18 # 0x47a1d4 syscall.read+0x54 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 # 0x4775a8 syscall.Read+0x48 /usr/local/go/src/syscall/syscall_unix.go:162 # 0x49758e os.(*File).read+0x4e /usr/local/go/src/os/file_unix.go:165 # 0x4940b5 os.(*File).Read+0x75 /usr/local/go/src/os/file.go:101 # 0x46e1df io.copyBuffer+0xff /usr/local/go/src/io/io.go:390 # 0x46e047 io.CopyBuffer+0x87 /usr/local/go/src/io/io.go:371 # 0x869f4c github.com/vmware/vic/lib/tether.establishNonPty.func3+0x12c /home/vagrant/vicsmb/src/github.com/vmware/vic/lib/tether/tether_linux.go:341 1 @ 0x47dae5 0x47a1d5 0x4775a9 0x49758f 0x4940b6 0x808855 0x4594d1 # 0x47dae4 syscall.Syscall+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:18 # 0x47a1d4 syscall.read+0x54 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 # 0x4775a8 syscall.Read+0x48 /usr/local/go/src/syscall/syscall_unix.go:162 # 0x49758e os.(*File).read+0x4e /usr/local/go/src/os/file_unix.go:165 # 0x4940b5 os.(*File).Read+0x75 /usr/local/go/src/os/file.go:101 # 0x808854 github.com/vmware/vic/pkg/serial.(*RawConn).Read.func2+0x54 /home/vagrant/vicsmb/src/github.com/vmware/vic/pkg/serial/rawconn.go:117 1 @ 0x47dae5 0x47a1d5 0x4775a9 0x4e91ab 0x4fc4e0 0x776018 0x4594d1 # 0x47dae4 syscall.Syscall+0x4 /usr/local/go/src/syscall/asm_linux_amd64.s:18 # 0x47a1d4 syscall.read+0x54 /usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 # 0x4775a8 syscall.Read+0x48 /usr/local/go/src/syscall/syscall_unix.go:162 # 0x4e91aa net.(*netFD).Read+0x13a /usr/local/go/src/net/fd_unix.go:246 # 0x4fc4df net.(*conn).Read+0x6f /usr/local/go/src/net/net.go:181 # 0x776017 net/http.(*connReader).backgroundRead+0x57 /usr/local/go/src/net/http/server.go:656 1 @ 0x8011d2 0x800fd0 0x7fd645 0x8049d1 0x804ce3 0x77d2f4 0x77e730 0x77fa92 0x77be62 0x4594d1 # 0x8011d1 runtime/pprof.writeRuntimeProfile+0xa1 /usr/local/go/src/runtime/pprof/pprof.go:632 # 0x800fcf runtime/pprof.writeGoroutine+0x9f /usr/local/go/src/runtime/pprof/pprof.go:594 # 0x7fd644 runtime/pprof.(*Profile).WriteTo+0x3b4 /usr/local/go/src/runtime/pprof/pprof.go:302 # 0x8049d0 net/http/pprof.handler.ServeHTTP+0x1d0 /usr/local/go/src/net/http/pprof/pprof.go:209 # 0x804ce2 net/http/pprof.Index+0x1e2 /usr/local/go/src/net/http/pprof/pprof.go:221 # 0x77d2f3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/go/src/net/http/server.go:1942 # 0x77e72f net/http.(*ServeMux).ServeHTTP+0x12f /usr/local/go/src/net/http/server.go:2238 # 0x77fa91 net/http.serverHandler.ServeHTTP+0x91 /usr/local/go/src/net/http/server.go:2568 # 0x77be61 net/http.(*conn).serve+0x611 /usr/local/go/src/net/http/server.go:1825 ```

ps output from exec session before stop:

  715 root       0:00 /.tether/lib64/ld-linux-x86-64.so.2 --library-path /.teth
  725 root       0:00 /bin/ash -c /bin/ash -c 'sleep 20000'
  729 root       0:00 /bin/ash -c sleep 20000
  730 root       0:00 sleep 20000
  731 root       0:00 [flush-8:0]
  732 root       0:00 /bin/ash
  733 root       0:00 ps -ef

ps output from exec session after stop:

  715 root       0:00 /.tether/lib64/ld-linux-x86-64.so.2 --library-path /.teth
  729 root       0:00 /bin/ash -c sleep 20000
  730 root       0:00 sleep 20000
  731 root       0:00 [flush-8:0]
  732 root       0:00 /bin/ash
  734 root       0:00 ps -ef

We can see from the ps output that the primary process is killed but the children are not - this is causing us to block waiting for stdout/stderr to close. Confirmed the reported behaviour, and that it doesn't happen if running with a TTY (-t).

hickeng commented 7 years ago

From regular docker we can see that the children are created as part of a process group (5th field of the stat output):

vagrant@devbox:/mnt/test/volumes$ docker run -d alpine /bin/ash -c "/bin/ash -c 'sleep 20000'"
9f98cd49920b2f06d8a2f2ba7cc0976842093e389cc868282f953104fc858a33
vagrant@devbox:/mnt/test/volumes$ ps -ef | grep sleep
root     14786 14765  2 21:32 ?        00:00:00 /bin/ash -c /bin/ash -c 'sleep 20000'
root     14808 14786  0 21:32 ?        00:00:00 /bin/ash -c sleep 20000
root     14809 14808  0 21:32 ?        00:00:00 sleep 20000
vagrant  14816 17041  0 21:32 pts/1    00:00:00 grep --color=auto sleep
root     21595 21538  0 Oct04 ?        00:00:00 /tmp/nfs-sleep infinity
vagrant@devbox:/mnt/test/volumes$ cd /proc/14808
vagrant@devbox:/proc/14808$ cat stat
14808 (ash) S 14786 14786 14786 0 -1 4194304 47 0 0 0 0 0 0 0 20 0 1 0 71471791 1548288 1 18446744073709551615 1 1 0 0 0 0 0 4 2 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
vagrant@devbox:/proc/14808$

I've tried setting SysProcAttr.setpgid for the primary process in tether and then killing the group, but for some reason it doesn't take effect unless running in a tty - speculating that it needs one of the other sysprocattr settings that the pty.Start path configures, but unconfirmed.

hickeng commented 7 years ago

Created PR #6528 which addresses the observed behaviour however this is missing actual understanding of why using setsid worked, why setpgid did not, and what the actual impact of setsid is for non-tty processes (it was already in use for tty).

There's also a neatness issue observed (we really should provide a better error message if the container is stopped underneath an active exec session):

/ # ps -ef
PID   USER     TIME   COMMAND
    1 root       0:06 /.tether/tether
<snip>
  307 root       0:00 /bin/ash -c /bin/ash -c 'sleep 20000'
  309 root       0:00 /bin/ash -c sleep 20000
  310 root       0:00 sleep 20000
  312 root       0:00 /bin/ash
  315 root       0:00 ps -ef
/ # Error response from daemon: Server error from portlayer: [GET /tasks][500] inspectInternalServerError  &{Code:0 Message:unknown task ID: 6be41f3668d07440d96aacb31f74e057e74f2693cdba996982b1c7cda26bf344}
vagrant@devbox:~/vicsmb/src/github.com/vmware/vic$

hickeng commented 7 years ago

Moving to Sprint19 - functional PR in hand (#6528) however it won't be merged by the end of Sprint18

vmware / vic

Regression: VIC power-off failure on stop / kill for unresponsive containers #6199