lyft / cni-ipvlan-vpc-k8s

AWS VPC Kubernetes CNI driver using IPvlan
Apache License 2.0
360 stars 58 forks source link

Update netlink library to support 4.20+ kernels #88

Closed lbernail closed 4 years ago

lbernail commented 4 years ago

This PR updates the netlink library to add a patch required for kernel 4.20+ See this PR for more detail: https://github.com/vishvananda/netlink/pull/498

Symptoms of the problem:

lbernail commented 4 years ago

Closing for now as I've found a regression in another call Will reopen when I'm sure I have a patch

theatrus commented 4 years ago

Thanks. We are running this on 5.5/5.6/5.7 kernels with no issues so curious what you end up seeing.

lbernail commented 4 years ago

@theatrus interesting. For us Delete CNI calls fail on kernel 5.3 with the delete code from v0.6.1

Steps to reproduce:

sudo ip netns add testing
sudo CNI_PATH=/opt/cni/bin cnitool add cni-ipvlan-vpc-k8s /var/run/netns/testing
sudo CNI_PATH=/opt/cni/bin cnitool del cni-ipvlan-vpc-k8s /var/run/netns/testing

The del command hangs and never returns. The exact behavior actually depends on the go version used for compilation which I found very misleading until I saw the patch from Daniel which fixes a stack corruption in vethPeerIndex (https://github.com/lyft/cni-ipvlan-vpc-k8s/blob/v0.6.1/plugin/unnumbered-ptp/unnumbered-ptp.go#L579).

Note that the version we run does not include https://github.com/lyft/cni-ipvlan-vpc-k8s/pull/83 Which means its calls the buggy netlink.vethPeerIndex which master doesn't use anymore (see: https://github.com/containernetworking/plugins/blob/master/pkg/ip/link_linux.go#L258).

So I think the issue is only present in versions up to v0.6.1 (included).

We are currently testing the netlink update with an additional small patch which I can PR (upgrading the library should be good in any case as we all move to much more recent kernels)