lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

unnumbered-ptp: fix DEL code #83

Closed tomwans closed 4 years ago

tomwans commented 4 years ago

The major logic issue stemmed from a panic being caused by "inconsistent mutex state." Stack trace from this panic is below.

I didn't dig too far into why this happens, but my understanding is this:

  1. cni process used init() to LockOSThread the main goroutine to one process.
  2. eventually, we make a call to netns.Do(), which brings us into a funky locked thread to enter the network namespace.
  3. We call TeardownIPMasq, which eventually spawns a process, which depends on some mutex, which gets into an inconsistent state.

I haven't looked into exactly why this happens, but we don't need to really look into it. Do() does all the thread locking we need under the hood, there's no need to do it in init(). So double checked our usages and determined we can safely remove it. This removes the panic.

Panic Stacktrace:

Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: fatal error: sync: inconsistent mutex state
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: goroutine 1 [running, locked to thread]:
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.throw(0x766c05, 0x1e)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/panic.go:617 +0x72 fp=0xc00024b708 sp=0xc00024b6d8 pc=0x42d172
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: sync.throw(0x766c05, 0x1e)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/panic.go:603 +0x35 fp=0xc00024b728 sp=0xc00024b708 pc=0x42d0f5
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: sync.(*Mutex).Lock(0xc00036e164)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/sync/mutex.go:143 +0x165 fp=0xc00024b768 sp=0xc00024b728 pc=0x4614a5
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: sync.(*RWMutex).Lock(0xc00036e164)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/sync/rwmutex.go:93 +0x2d fp=0xc00024b788 sp=0xc00024b768 pc=0x46225d
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: os.(*Process).wait(0xc00036e150, 0x7745e8, 0x7745f0, 0x7745e0)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/os/exec_unix.go:32 +0x20e fp=0xc00024b800 sp=0xc00024b788 pc=0x4b911e
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: os.(*Process).Wait(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/os/exec.go:125
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: os/exec.(*Cmd).Wait(0xc0003aa000, 0x0, 0x0)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/os/exec/exec.go:474 +0x61 fp=0xc00024b878 sp=0xc00024b800 pc=0x56be41
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: os/exec.(*Cmd).Run(0xc0003aa000, 0xc0003aa000, 0x5)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/os/exec/exec.go:318 +0x5c fp=0xc00024b8a0 sp=0xc00024b878 pc=0x56b36c
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/coreos/go-iptables/iptables.(*IPTables).runWithOutput(0xc000336380, 0xc000354340, 0x6, 0xa, 0x0, 0x0, 0x0, 0x0)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/coreos/go-iptables@v0.4.0/iptables/iptables.go:385 +0x295 fp=0xc00024ba80 sp=0xc00024b8a0 pc=0x58b555
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/coreos/go-iptables/iptables.(*IPTables).run(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/coreos/go-iptables@v0.4.0/iptables/iptables.go:356
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/coreos/go-iptables/iptables.(*IPTables).DeleteChain(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/coreos/go-iptables@v0.4.0/iptables/iptables.go:335
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/containernetworking/plugins/pkg/ip.TeardownIPMasq(0xc00024bc58, 0xc0002682d0, 0x1c, 0xc000142240, 0x56, 0xf90, 0x56)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/containernetworking/plugins@v0.7.4/pkg/ip/ipmasq_linux.go:100 +0x4d3 fp=0xc00024bb88 sp=0xc00024ba80 pc=0x59c003
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: main.cmdDel(0xc00033da40, 0xc000306438, 0x5)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /code/cni-ipvlan-vpc-k8s-private/cni-ipvlan-vpc-k8s/plugin/unnumbered-ptp/unnumbered-ptp.go:592 +0x417 fp=0xc00024bdd0 sp=0xc00024bb88 pc=0x6b68e7
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc00024bf18, 0xc00033da40, 0x7cbd00, 0xc0001204e0, 0x774320, 0x0, 0xc0002a6ac0)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/containernetworking/cni@v0.6.0/pkg/skel/skel.go:162 +0x259 fp=0xc00024be60 sp=0xc00024bdd0 pc=0x528e09
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc00024bf18, 0x774318, 0x774320, 0x7cbd00, 0xc0001204e0, 0x70e960)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/containernetworking/cni@v0.6.0/pkg/skel/skel.go:175 +0x2e8 fp=0xc00024bec8 sp=0xc00024be60 pc=0x529148
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/containernetworking/cni@v0.6.0/pkg/skel/skel.go:210
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: github.com/containernetworking/cni/pkg/skel.PluginMain(0x774318, 0x774320, 0x7cbd00, 0xc0001204e0)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /go/pkg/mod/github.com/containernetworking/cni@v0.6.0/pkg/skel/skel.go:222 +0xf3 fp=0xc00024bf68 sp=0xc00024bec8 pc=0x529353
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: main.main()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /code/cni-ipvlan-vpc-k8s-private/cni-ipvlan-vpc-k8s/plugin/unnumbered-ptp/unnumbered-ptp.go:614 +0xbf fp=0xc00024bf98 sp=0xc00024bf68 pc=0x6b6bef
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.main()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:200 +0x20c fp=0xc00024bfe0 sp=0xc00024bf98 pc=0x42eadc
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goexit()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00024bfe8 sp=0xc00024bfe0 pc=0x458f41
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: goroutine 2 [force gc (idle)]:
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.gopark(0x774948, 0x9e59a0, 0x1410, 0x1)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:301 +0xef fp=0xc0000d4fb0 sp=0xc0000d4f90 pc=0x42eecf
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goparkunlock(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:307
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.forcegchelper()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:250 +0xb7 fp=0xc0000d4fe0 sp=0xc0000d4fb0 pc=0x42ed77
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goexit()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000d4fe8 sp=0xc0000d4fe0 pc=0x458f41
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: created by runtime.init.5
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:239 +0x35
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: goroutine 3 [GC sweep wait]:
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.gopark(0x774948, 0x9e5ae0, 0x140c, 0x1)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:301 +0xef fp=0xc0000d57a8 sp=0xc0000d5788 pc=0x42eecf
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goparkunlock(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:307
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.bgsweep(0xc00001a070)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/mgcsweep.go:70 +0x9c fp=0xc0000d57d8 sp=0xc0000d57a8 pc=0x42257c
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goexit()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000d57e0 sp=0xc0000d57d8 pc=0x458f41
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: created by runtime.gcenable
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/mgc.go:208 +0x58
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: goroutine 18 [finalizer wait]:
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.gopark(0x774948, 0xa028a0, 0xc0000d140f, 0x1)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:301 +0xef fp=0xc0000d4758 sp=0xc0000d4738 pc=0x42eecf
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goparkunlock(...)
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/proc.go:307
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.runfinq()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/mfinal.go:175 +0xa6 fp=0xc0000d47e0 sp=0xc0000d4758 pc=0x4193c6
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: runtime.goexit()
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000d47e8 sp=0xc0000d47e0 pc=0x458f41
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: created by runtime.createfing
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]:         /usr/lib/golang/src/runtime/mfinal.go:156 +0x61
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: time="2020-04-14 16:59:26.662730329Z" level=error msg="Error deleting network: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input"
Apr 14 16:59:26 ip-10-44-171-83.ec2.internal crio[2422]: time="2020-04-14 16:59:26.662964240Z" level=error msg="Error while removing pod from CNI network "paulnet": netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input"