openshift / openshift-sdn

Apache License 2.0
69 stars 63 forks source link

panic parsing container id in newSDNPod #222

Closed liggitt closed 8 years ago

liggitt commented 8 years ago

From user report, panic in newSDNPod on this line:

        containerID = strings.Split(kPod.Status.ContainerStatuses[0].ContainerID, "://")[1]
Dec 05 19:14:46 prod-node-3 origin-node[13019]: panic: runtime error: index out of range
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 57 [running]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopanic(0x2629f80, 0xc20802a000)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/panic.go:425 +0x2a3 fp=0xc208789300 sp=0xc208789298
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.panicindex()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/panic.go:12 +0x4e fp=0xc208789328 sp=0xc208789300
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/openshift-sdn/plugins/osdn.newSDNPod(0xc208789538, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/osdn.go:122 +0x151 fp=0xc2087893c0 sp=0xc208789328
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/openshift-sdn/plugins/osdn.(*OsdnRegistryInterface).GetPods(0xc20854b9a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/osdn.go:142 +0x428 fp=0xc208789920 sp=0xc2087893c0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/openshift-sdn/pkg/ovssubnet.(*OvsController).watchAndGetResource(0xc208418480, 0x2a65110, 0x3, 0x0, 0x0, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/common.go:798 +0x98c fp=0xc208789ab0 sp=0xc208789920
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/openshift-sdn/pkg/ovssubnet.(*OvsController).StartNode(0xc208418480, 0x5aa, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/common.go:487 +0xf69 fp=0xc208789ed0 sp=0xc208789ab0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/openshift-sdn/plugins/osdn/multitenant.Node(0xc20854b9a0, 0xc2084d2120, 0x2a, 0x0, 0x0, 0xc2086ed380, 0x7f64c44a7110, 0xc20854ea20, 0x5aa)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/openshift/openshift-sdn/plugins/osdn/multitenant/multitenant.go:47 +0x254 fp=0xc208789f98 sp=0xc208789ed0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208789fa0 sp=0xc208789f98
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by github.com/openshift/origin/pkg/cmd/server/start.RunSDNController
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_build/src/github.com/openshift/origin/pkg/cmd/server/start/start_node.go:270 +0x60b
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 1 [select (no cases)]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x0, 0x0, 0x2c56e70, 0x11)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc208053ae8 sp=0xc208053ab8
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.block()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/select.go:176 +0x46 fp=0xc208053b10 sp=0xc208053ae8
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/origin/pkg/cmd/server/start.NodeOptions.StartNode(0xc20834a790, 0x7fff56ec2ef7, 0x21, 0x7f64c4483870, 0xc20802e008, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_build/src/github.com/openshift/origin/pkg/cmd/server/start/start_node.go:136 +0xdc fp=0xc208053b60 sp=0xc208053b10
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/openshift/origin/pkg/cmd/server/start.func·011(0xc20815d2c0, 0xc2084a5f20, 0x0, 0x2)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_build/src/github.com/openshift/origin/pkg/cmd/server/start/start_node.go:66 +0x435 fp=0xc208053d10 sp=0xc208053b60
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/spf13/cobra.(*Command).execute(0xc20815d2c0, 0xc2084a5d40, 0x2, 0x2, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/spf13/cobra/command.go:547 +0x82f fp=0xc208053e28 sp=0xc208053d10
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/spf13/cobra.(*Command).Execute(0xc20815c5a0, 0x0, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/spf13/cobra/command.go:630 +0x414 fp=0xc208053f30 sp=0xc208053e28
Dec 05 19:14:46 prod-node-3 origin-node[13019]: main.main()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_build/src/github.com/openshift/origin/cmd/openshift/openshift.go:22 +0x175 fp=0xc208053f98 sp=0xc208053f30
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.main()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:63 +0xf3 fp=0xc208053fe0 sp=0xc208053f98
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208053fe8 sp=0xc208053fe0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 2 [force gc (idle)]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x435b10, 0x4ad9700, 0x2b00710, 0xf)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc20801a798 sp=0xc20801a768
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goparkunlock(0x4ad9700, 0x2b00710, 0xf)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:136 +0x48 fp=0xc20801a7c0 sp=0xc20801a798
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.forcegchelper()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:99 +0xce fp=0xc20801a7e0 sp=0xc20801a7c0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801a7e8 sp=0xc20801a7e0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by runtime.init·4
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:87 +0x25
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 3 [GC sweep wait]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x435b10, 0x4aec618, 0x2abc910, 0xd)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc20801df98 sp=0xc20801df68
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goparkunlock(0x4aec618, 0x2abc910, 0xd)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:136 +0x48 fp=0xc20801dfc0 sp=0xc20801df98
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:136 +0x48 fp=0xc20801dfc0 sp=0xc20801df98
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.bgsweep()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/mgc0.go:98 +0xbc fp=0xc20801dfe0 sp=0xc20801dfc0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801dfe8 sp=0xc20801dfe0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by gc
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/mgc0.c:1386
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 4 [finalizer wait]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x435b10, 0x4aec610, 0x2aff790, 0xe)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc208018f30 sp=0xc208018f00
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goparkunlock(0x4aec610, 0x2aff790, 0xe)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:136 +0x48 fp=0xc208018f58 sp=0xc208018f30
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.runfinq()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/malloc.go:727 +0xba fp=0xc208018fe0 sp=0xc208018f58
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208018fe8 sp=0xc208018fe0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by runtime.createfing
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/malloc.go:707 +0x5e
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 5 [syscall]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.notetsleepg(0x4af45c0, 0xffffffffffffffff, 0x1)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/lock_futex.go:201 +0x52 fp=0xc208019768 sp=0xc208019740
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.signal_recv(0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/sigqueue.go:109 +0x135 fp=0xc2080197a0 sp=0xc208019768
Dec 05 19:14:46 prod-node-3 origin-node[13019]: os/signal.loop()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/os/signal/signal_unix.go:21 +0x1f fp=0xc2080197e0 sp=0xc2080197a0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080197e8 sp=0xc2080197e0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by os/signal.init·1
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/os/signal/signal_unix.go:27 +0x35
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 10 [chan receive]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x435b10, 0xc20800c838, 0x2af1790, 0xc)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc20801c698 sp=0xc20801c668
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goparkunlock(0xc20800c838, 0x2af1790, 0xc)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:136 +0x48 fp=0xc20801c6c0 sp=0xc20801c698
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.chanrecv(0x1f79ea0, 0xc20800c7e0, 0xc20801c7b8, 0xbc3301, 0x450000)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/chan.go:467 +0x82c fp=0xc20801c760 sp=0xc20801c6c0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.chanrecv2(0x1f79ea0, 0xc20800c7e0, 0xc20801c7b8, 0xbc560b)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/chan.go:316 +0x2b fp=0xc20801c790 sp=0xc20801c760
Dec 05 19:14:46 prod-node-3 origin-node[13019]: github.com/golang/glog.(*loggingT).flushDaemon(0x4ada7c0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/golang/glog/glog.go:879 +0x78 fp=0xc20801c7d8 sp=0xc20801c790
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801c7e0 sp=0xc20801c7d8
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by github.com/golang/glog.init·1
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /builddir/build/BUILD/origin-git-0.a6bb7cd/_thirdpartyhacks/src/github.com/golang/glog/glog.go:410 +0x2a7
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 17 [syscall, locked to thread]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208052fe8 sp=0xc208052fe0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 11 [syscall]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.notetsleepg(0x4ada158, 0x5810bf154, 0x0)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/lock_futex.go:201 +0x52 fp=0xc208017f68 sp=0xc208017f40
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.timerproc()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/time.go:207 +0xfa fp=0xc208017fe0 sp=0xc208017f68
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208017fe8 sp=0xc208017fe0
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by runtime.addtimerLocked
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/time.go:113 +0x1ba
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 68 [select]:
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.gopark(0x418330, 0xc208019f40, 0x2a8a110, 0x6)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc208019cb8 sp=0xc208019c88
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.selectgoImpl(0xc208019f40, 0x0, 0x18)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/select.go:366 +0xb2c fp=0xc208019e50 sp=0xc208019cb8
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.selectgo(0xc208019f40)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/select.go:183 +0x12 fp=0xc208019e70 sp=0xc208019e50
Dec 05 19:14:46 prod-node-3 origin-node[13019]: net/http.(*persistConn).writeLoop(0xc2082d8000)
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/net/http/transport.go:945 +0x41d fp=0xc208019fd8 sp=0xc208019e70
Dec 05 19:14:46 prod-node-3 origin-node[13019]: runtime.goexit()
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208019fe0 sp=0xc208019fd8
Dec 05 19:14:46 prod-node-3 origin-node[13019]: created by net/http.(*Transport).dialConn
Dec 05 19:14:46 prod-node-3 origin-node[13019]: /usr/lib/golang/src/net/http/transport.go:661 +0xcbc
Dec 05 19:14:46 prod-node-3 origin-node[13019]: goroutine 71 [select]:
Dec 05 19:14:47 prod-node-3 origin-node[13019]: runtime.gopark(0x418330, 0xc20801af40, 0x2a8a110, 0x6)
Dec 05 19:14:47 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/proc.go:130 +0x105 fp=0xc20801acb8 sp=0xc20801ac88
Dec 05 19:14:47 prod-node-3 origin-node[13019]: runtime.selectgoImpl(0xc20801af40, 0x0, 0x18)
Dec 05 19:14:47 prod-node-3 origin-node[13019]: /usr/lib/golang/src/runtime/select.go:366 +0xb2c fp=0xc20801ae50 sp=0xc20801acb8
liggitt commented 8 years ago

It is likely the container status exists, but no container id is set in the status, which would mean the split call returns []{""}

gravis commented 8 years ago

Observed on 1.1.0, after a node reboot. The node has been unscheduled first, and the pods evacuated before reboot.

gravis commented 8 years ago

Following advice from Jordan @liggitt:

oc get pods --all-namespaces -o json | jq '.items[] | select(.status.containerStatuses | (length > 0 and .[0].containerID == null)) | {namespace:.metadata.namespace,name:.metadata.name}'

returns

{
  "namespace": "default",
  "name": "docker-registry-1-kg2kt"
}

And here's the container status:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2015-12-05T18:08:08Z
    message: 'containers with unready status: [registry]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  containerStatuses:
  - image: openshift/origin-docker-registry:v1.1
    imageID: ""
    lastState: {}
    name: registry
    ready: false
    restartCount: 0
    state:
      waiting:
        message: 'Image: openshift/origin-docker-registry:v1.1 is ready, container
          is creating'
        reason: ContainerCreating
  hostIP: 172.29.100.140
  phase: Pending
  startTime: 2015-12-05T18:08:08Z

The container was failing because glusterfs endpoints have disappeared (https://github.com/openshift/origin/issues/6070).

Once the endpoints recreated, and registry started automatically, and the panic was gone.

liggitt commented 8 years ago

From an SDN perspective, this means any node that hits this when any pod is in a state like this will panic. Needs to be fixed asap

gravis commented 8 years ago

Can't agree more :)

smarterclayton commented 8 years ago

@rajatchopra @pravisankar @dcbw this is must-fix for 1.1.1/3.1.1

bmeng commented 8 years ago

@liggitt Do you think your issue should have been fixed by this patch? https://github.com/openshift/openshift-sdn/pull/214/

bmeng commented 8 years ago

I can reproduce the panic error without the fix in #214 and the issue gone after the fix merged.

0xmichalis commented 8 years ago

This is fixed by #214 but we still need to pull it in Origin.

0xmichalis commented 8 years ago

Actually it's also fixed in Origin after https://github.com/openshift/origin/pull/6060

liggitt commented 8 years ago

yes, that's the same issue