hyperhq / hyperd

HyperContainer Daemon
http://www.hypercontainer.io
Apache License 2.0
1.98k stars 196 forks source link

hyperd will crash under great pressure #537

Closed YaoZengzeng closed 7 years ago

YaoZengzeng commented 7 years ago

I'm now doing k8s node e2e test, the container runtime is frakti & hyperd. However, during the process of testing, the hyperd always crashed, which makes me so annoyed. The log is as follows:

E0309 02:24:19.847643 3965 container.go:1054] Pod[k8s_POD.0_pod-configmaps-3b91a054-046f-11e7-8d09-42010a8c0006_e2e-tests-secrets-0ss3b_3b91f88d-046f-11e7-b2fb-42010a8c0006_aa209b8e] Con[b842642bfbb9(k8s_env-test.0_pod-configmaps-3b91a054-046f-11e7-8d09-42010a8c0006_e2e-tests-secrets-0ss3b_3b91f88d-046f-11e7-b2fb-42010a8c0006_18d2fe90)] failed to umount root volume: not a directory E0309 02:24:19.847678 3965 decommission.go:272] Pod[k8s_POD.0_pod-configmaps-3b91a054-046f-11e7-8d09-42010a8c0006_e2e-tests-secrets-0ss3b_3b91f88d-046f-11e7-b2fb-42010a8c0006_aa209b8e] Con[b842642bfbb9(k8s_env-test.0_pod-configmaps-3b91a054-046f-11e7-8d09-42010a8c0006_e2e-tests-secrets-0ss3b_3b91f88d-046f-11e7-b2fb-42010a8c0006_18d2fe90)] failed to umount root volume E0309 02:24:19.847690 3965 container.go:64] ContainerRemove failed: not a directory E0309 02:24:19.852968 3965 container.go:1054] Pod[k8s_POD.0_pod-update-activedeadlineseconds-3b8d6e8d-046f-11e7-afaf-42010a8c0006_e2e-tests-pods-1245p_3b8dfb4e-046f-11e7-b2fb-42010a8c0006_9acb0442] Con[adf54fc9e08e(k8s_nginx.0_pod-update-activedeadlineseconds-3b8d6e8d-046f-11e7-afaf-42010a8c0006_e2e-tests-pods-1245p_3b8dfb4e-046f-11e7-b2fb-42010a8c0006_d04ab55f)] failed to umount root volume: not a directory E0309 02:24:19.852986 3965 decommission.go:272] Pod[k8s_POD.0_pod-update-activedeadlineseconds-3b8d6e8d-046f-11e7-afaf-42010a8c0006_e2e-tests-pods-1245p_3b8dfb4e-046f-11e7-b2fb-42010a8c0006_9acb0442] Con[adf54fc9e08e(k8s_nginx.0_pod-update-activedeadlineseconds-3b8d6e8d-046f-11e7-afaf-42010a8c0006_e2e-tests-pods-1245p_3b8dfb4e-046f-11e7-b2fb-42010a8c0006_d04ab55f)] failed to umount root volume E0309 02:24:19.852997 3965 container.go:64] ContainerRemove failed: not a directory E0309 02:24:19.857037 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.857294 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.857810 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.858153 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.858748 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.859055 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.859667 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.859947 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.860475 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.860766 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.861365 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.861616 3965 info.go:20] GetPodInfo error: pod not ready E0309 02:24:19.862122 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_annotationupdate65416a50-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-wqdkc_654180c7-fe61-11e6-aa08-42010a8c0003_78092a35) E0309 02:24:19.862450 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_annotationupdate65416a50-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-wqdkc_654180c7-fe61-11e6-aa08-42010a8c0003_78092a35) E0309 02:24:19.863008 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-883505a4-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-f14wk_88351c69-fe61-11e6-aa08-42010a8c0003_4c0d9a56) E0309 02:24:19.863310 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-883505a4-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-f14wk_88351c69-fe61-11e6-aa08-42010a8c0003_4c0d9a56) E0309 02:24:19.864182 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-9dbbe239-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-plj2n_9dbbf3b3-fe61-11e6-aa08-42010a8c0003_f9a96f4a) E0309 02:24:19.864486 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-9dbbe239-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-plj2n_9dbbf3b3-fe61-11e6-aa08-42010a8c0003_f9a96f4a) E0309 02:24:19.865106 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-ba75cd42-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-l1lbb_ba75e3ff-fe61-11e6-aa08-42010a8c0003_c9e202ff) E0309 02:24:19.865394 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-ba75cd42-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-l1lbb_ba75e3ff-fe61-11e6-aa08-42010a8c0003_c9e202ff) E0309 02:24:19.865960 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-cd9aad52-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-r2hhw_cd9abf98-fe61-11e6-aa08-42010a8c0003_ae5b25f5) E0309 02:24:19.866243 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downward-api-cd9aad52-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-r2hhw_cd9abf98-fe61-11e6-aa08-42010a8c0003_ae5b25f5) E0309 02:24:19.866818 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-39e8a03d-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-qndjb_39e9014c-fe61-11e6-aa08-42010a8c0003_9acb0442) E0309 02:24:19.867100 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-39e8a03d-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-qndjb_39e9014c-fe61-11e6-aa08-42010a8c0003_9acb0442) E0309 02:24:19.867622 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-5bac73f3-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-xh9cd_5bac8a72-fe61-11e6-aa08-42010a8c0003_83e4f98d) E0309 02:24:19.867950 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-5bac73f3-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-xh9cd_5bac8a72-fe61-11e6-aa08-42010a8c0003_83e4f98d) E0309 02:24:19.868488 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-7d703bf3-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-2395t_7d705580-fe61-11e6-aa08-42010a8c0003_34040e1e) E0309 02:24:19.868748 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-7d703bf3-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-2395t_7d705580-fe61-11e6-aa08-42010a8c0003_34040e1e) E0309 02:24:19.869367 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-92f84116-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-438d9_92f8540d-fe61-11e6-aa08-42010a8c0003_b25c2fef) E0309 02:24:19.869694 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-92f84116-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-438d9_92f8540d-fe61-11e6-aa08-42010a8c0003_b25c2fef) E0309 02:24:19.870227 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-a74ee1b5-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-75th1_a74ef600-fe61-11e6-aa08-42010a8c0003_b12885fa) E0309 02:24:19.870527 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-a74ee1b5-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-75th1_a74ef600-fe61-11e6-aa08-42010a8c0003_b12885fa) E0309 02:24:19.871181 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-b0e19487-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-mbknk_b0e1aded-fe61-11e6-aa08-42010a8c0003_474c4687) E0309 02:24:19.871461 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-b0e19487-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-mbknk_b0e1aded-fe61-11e6-aa08-42010a8c0003_474c4687) E0309 02:24:19.871980 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-c4078abc-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-fzx8v_c407a44c-fe61-11e6-aa08-42010a8c0003_e4f4e3ea) E0309 02:24:19.872305 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-c4078abc-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-fzx8v_c407a44c-fe61-11e6-aa08-42010a8c0003_e4f4e3ea) E0309 02:24:19.873207 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-e0c0cef9-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-dlwrr_e0c0e868-fe61-11e6-aa08-42010a8c0003_e5a143d2) E0309 02:24:19.873523 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-e0c0cef9-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-dlwrr_e0c0e868-fe61-11e6-aa08-42010a8c0003_e5a143d2) E0309 02:24:19.874092 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-f51aca3b-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-dc7n0_f51ae3e2-fe61-11e6-aa08-42010a8c0003_b5fe0392) E0309 02:24:19.874369 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_downwardapi-volume-f51aca3b-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-dc7n0_f51ae3e2-fe61-11e6-aa08-42010a8c0003_b5fe0392) E0309 02:24:19.875048 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_labelsupdate44ad6f33-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-5rvdw_44ad820c-fe61-11e6-aa08-42010a8c0003_afd3a30c) E0309 02:24:19.875317 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_labelsupdate44ad6f33-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-5rvdw_44ad820c-fe61-11e6-aa08-42010a8c0003_afd3a30c) E0309 02:24:19.875849 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_metadata-volume-d72d4ec0-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-2rblm_d72d653b-fe61-11e6-aa08-42010a8c0003_bde9d03d) E0309 02:24:19.876252 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_metadata-volume-d72d4ec0-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-2rblm_d72d653b-fe61-11e6-aa08-42010a8c0003_bde9d03d) E0309 02:24:19.876798 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_metadata-volume-eb878879-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-d14fj_eb87a428-fe61-11e6-aa08-42010a8c0003_7e3e8dd0) E0309 02:24:19.877051 3965 info.go:20] GetPodInfo error: Can not get Pod info with pod ID(k8s_POD.0_metadata-volume-eb878879-fe61-11e6-a2ed-42010a8c0003_e2e-tests-downward-api-d14fj_eb87a428-fe61-11e6-aa08-42010a8c0003_7e3e8dd0) E0309 02:25:01.351304 3965 decommission.go:383] Pod[k8s_POD.0_liveness-http_e2e-tests-container-probe-dkzrg_4dc65ba0-046f-11e7-b2fb-42010a8c0006_78092a35] only alived pod could be stopped, current 5 E0309 02:25:01.351316 3965 stop.go:20] only alived pod could be stopped, current 5 E0309 02:25:01.402502 3965 serial.go:337] read tty data failed E0309 02:25:01.402681 3965 vm_states.go:177] Shutting down because of an exception: Destroy pod failed panic: send on closed channel

goroutine 2558 [running]: panic(0x11a8fa0, 0xc4210bc590) /usr/local/go/src/runtime/panic.go:500 +0x1a1 github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/qemu.(QemuContext).Close(0xc42129f560) /root/code/kubernetes/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/qemu/qemu.go:180 +0x5e github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor.(VmContext).Close(0xc420994360) /root/code/kubernetes/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/context.go:240 +0x131 github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor.(*VmContext).shutdownVM(0xc420994360) /root/code/kubernetes/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/vm_states.go:171 +0x117 created by github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor.stateRunning /root/code/kubernetes/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/vm_states.go:215 +0x392

And the of hyperd, runv, hyperstart is as follows:

hyperd commit: 6b83ed830a1d166c5d6f33e238ea9101e313a531 runv commit: 956133b34d4fd509908c1b65b477159c32a19075 hyperstart commit: 544e11f8e93801d90271792ab0e2ba8dba5b9293

feiskyer commented 7 years ago

Encoutered hyperd crash on removing containers/pods:

Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.798543   24575 persist.go:102] Pod[k8s_POD.1_kube-dns-1178038448-z0v4p_kube-system_55c4bc75-
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.798704   24575 persist.go:102] Pod[k8s_POD.1_kube-dns-847575035-p8fcw_kube-system_55c8a2d0-0
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.798866   24575 persist.go:102] Pod[k8s_POD.2_kube-dns-514368535-kj73m_kube-system_f04cf2da-0
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.799097   24575 persist.go:102] Pod[k8s_POD.2_kube-dns-806549836-1pjqc_kube-system_d96d7910-0
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.799343   24575 persist.go:102] Pod[k8s_POD.2_kube-dns-806549836-8w1b9_kube-system_5f5cbb0a-0
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.799548   24575 persist.go:102] Pod[k8s_POD.3_kube-dns-806549836-gxjx1_kube-system_117c4a2e-0
Mar 14 07:42:21 ubuntu-0 hyperd[24575]: E0314 07:42:21.799716   24575 persist.go:102] Pod[k8s_POD.4_kube-dns-2233971047-8b83r_kube-system_bc555f07-
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x698f79]
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: goroutine 369 [running]:
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: panic(0x11d5a80, 0xc420010040)
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /usr/local/go/src/runtime/panic.go:500 +0x1a1
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor.(*Vm).RemoveVolume(0x0, 0xc421057460, 0
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/vendor/github.com/hyperhq/runv/hypervisor/vm.go:571 +0x49
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/daemon/pod.(*Volume).tryRemoveFromSandbox(0xc4202a1f90, 0xc4206a9950, 0xc4210579a
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/daemon/pod/volume.go:138 +0x69
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/daemon/pod.(*XPod).RemoveContainer(0xc4201cf1e0, 0xc420a7e480, 0x40, 0x0, 0x0)
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/daemon/pod/decommission.go:286 +0x3ed
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/daemon.(*Daemon).RemoveContainer(0xc4206a40e0, 0xc420a7e480, 0x40, 0x2, 0xc420a6d
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/daemon/rm.go:47 +0x7e
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/serverrpc.(*ServerRPC).ContainerRemove(0xc420695300, 0x7f5f24a99b28, 0xc420a717a0
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/serverrpc/container.go:62 +0x173
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/types._PublicAPI_ContainerRemove_Handler(0x1354780, 0xc420695300, 0x7f5f24a99b28,
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/types/types.pb.go:3146 +0xdd
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc4206e9800, 0x1c99460,
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/vendor/google.golang.org/grpc/server.go:425 +0x9ca
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc4206e9800, 0x1c99460, 0xc
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/vendor/google.golang.org/grpc/server.go:574 +0x6ad
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: github.com/hyperhq/hyperd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc4207912f0, 0xc420
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/vendor/google.golang.org/grpc/server.go:296 +0xab
Mar 14 07:43:10 ubuntu-0 hyperd[24575]: created by github.com/hyperhq/hyperd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
Mar 14 07:43:10 ubuntu-0 hyperd[24575]:         /go/src/github.com/hyperhq/hyperd/vendor/google.golang.org/grpc/server.go:297 +0xa3
Mar 14 07:43:10 ubuntu-0 systemd[1]: hyperd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 14 07:43:10 ubuntu-0 systemd[1]: hyperd.service: Unit entered failed state.

@YaoZengzeng @laijs Any updates on this.

gnawux commented 7 years ago

looks like RemoveVolume is called after VM was set to nil

Crazykev commented 7 years ago

@feiskyer I meet this too, should be fixed with #542

gnawux commented 7 years ago

fixed by #542 and #545