weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

Weave-net CNI can not work on containerd=1.6.4 #3942

Open fuzhibo opened 2 years ago

fuzhibo commented 2 years ago

What you expected to happen?

Using Weave-net CNI for kubernetes=v1.20.1

What happened?

Found weave-net CNI can not work on kubernetes=v1.20.1 and containerd.io=1.6.4, the veth of containers can not be created.

How to reproduce it?

Upgrade containerd.io to 1.6.4

Anything else we need to know?

When I downgrades containerd.io=1.5.11,everything works.

Versions:

$ weave version
weave 2.8.1
$ docker version
```console
Client: Docker Engine - Community
 Version:           20.10.16
 API version:       1.41
 Go version:        go1.17.10
 Git commit:        aa7e414
 Built:             Thu May 12 09:17:28 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.16
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.10
  Git commit:       f756502
  Built:            Thu May 12 09:15:33 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ uname -a

Linux k8s-master 4.15.0-177-generic #186-Ubuntu SMP Thu Apr 14 20:23:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:00:47Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
## Logs:

$ docker logs weave

or, if using Kubernetes:

$ kubectl logs -n kube-system weave

<!-- (If output is long, please consider a Gist.) -->
<!-- Anything interesting or unusual output by the below, potentially relevant, commands?
$ journalctl -u docker.service --no-pager
$ journalctl -u kubelet --no-pager
May 12 14:59:51 k8s-master containerd[2087]: time="2022-05-12T14:59:51.590299653+08:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:fluentd-k26wt,Uid:cbe02001-06b7-4b8c-872e-7d559b61395c,Namespace:kube-system,Attempt:0,}"

May 12 14:59:51 k8s-master kernel: [11607.056092] weave: port 2(vethwepl73cc41b) entered blocking state

May 12 14:59:51 k8s-master kernel: [11607.056096] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:51 k8s-master kernel: [11607.056207] device vethwepl73cc41b entered promiscuous mode

May 12 14:59:51 k8s-master systemd-udevd[22597]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

May 12 14:59:51 k8s-master systemd-udevd[22598]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.

May 12 14:59:51 k8s-master networkd-dispatcher[1866]: WARNING:Unknown index 5167 seen, reloading interface list

May 12 14:59:51 k8s-master systemd-udevd[22598]: Could not generate persistent MAC address for vethwepl73cc41b: No such file or directory

May 12 14:59:51 k8s-master kernel: [11607.085723] eth0: renamed from vethwepg73cc41b

May 12 14:59:51 k8s-master systemd-udevd[22597]: link_config: could not get ethtool features for vethwepg73cc41b

May 12 14:59:51 k8s-master systemd-udevd[22597]: Could not set offload features of vethwepg73cc41b: No such device

May 12 14:59:51 k8s-master networkd-dispatcher[1866]: ERROR:Unknown interface index 5167 seen even after reload

May 12 14:59:51 k8s-master libvirtd[2076]: 2022-05-12 06:59:51.985+0000: 2831: error : virFileReadAll:1420 : Failed to open file '/sys/class/net/vethwepg73cc41b/operstate': No such file or directory

May 12 14:59:51 k8s-master libvirtd[2076]: 2022-05-12 06:59:51.985+0000: 2831: error : virNetDevGetLinkInfo:2530 : unable to read: /sys/class/net/vethwepg73cc41b/operstate: No such file or directory

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Link UP

May 12 14:59:52 k8s-master kernel: [11607.219944] IPv6: ADDRCONF(NETDEV_UP): vethwepl73cc41b: link is not ready

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Gained carrier

May 12 14:59:52 k8s-master kernel: [11607.226183] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

May 12 14:59:52 k8s-master kernel: [11607.226199] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

May 12 14:59:52 k8s-master kernel: [11607.226237] IPv6: ADDRCONF(NETDEV_CHANGE): vethwepl73cc41b: link becomes ready

May 12 14:59:52 k8s-master kernel: [11607.226293] weave: port 2(vethwepl73cc41b) entered blocking state

May 12 14:59:52 k8s-master kernel: [11607.226295] weave: port 2(vethwepl73cc41b) entered forwarding state

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Link DOWN

May 12 14:59:52 k8s-master systemd-networkd[1618]: vethwepl73cc41b: Lost carrier

May 12 14:59:52 k8s-master kernel: [11607.299911] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:52 k8s-master kernel: [11607.303493] device vethwepl73cc41b left promiscuous mode

May 12 14:59:52 k8s-master kernel: [11607.303495] weave: port 2(vethwepl73cc41b) entered disabled state

May 12 14:59:52 k8s-master containerd[2087]: time="2022-05-12T14:59:52.223538455+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:fluentd-k26wt,Uid:cbe02001-06b7-4b8c-872e-7d559b61395c,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\": failed to find network info for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\""

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224570    2063 remote_runtime.go:116] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224657    2063 kuberuntime_sandbox.go:70] CreatePodSandbox for pod "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224683    2063 kuberuntime_manager.go:755] createPodSandbox for pod "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993": failed to find network info for sandbox "73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993"

May 12 14:59:52 k8s-master kubelet[2063]: E0512 14:59:52.224767    2063 pod_workers.go:191] Error syncing pod cbe02001-06b7-4b8c-872e-7d559b61395c ("fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)"), skipping: failed to "CreatePodSandbox" for "fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"fluentd-k26wt_kube-system(cbe02001-06b7-4b8c-872e-7d559b61395c)\" failed: rpc error: code = Unknown desc = failed to setup network for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\": failed to find network info for sandbox \"73cc41b331bf40599cbd3fa813203c8e6d6478c753f298606421b1ae7a15e993\""

May 12 15:00:01 k8s-master kernel: [11616.961093] sh (22763): drop_caches: 3

May 12 15:00:01 k8s-master kernel: [11616.962742] sh (22760): drop_caches: 3

May 12 15:00:01 k8s-master kernel: [11616.964162] sh (22759): drop_caches: 3
$ kubectl get events
-->

## Network:
<!-- If your problem has anything to do with one network endpoint not being able to contact another, please run the following commands -->

$ ip route $ ip -4 -o addr $ sudo iptables-save

busyboy77 commented 2 years ago

Up for this Issue.

jpetazzo commented 2 years ago

For more details, see:

I'm not sure if it's weave's fault or containerd's fault; if it's weave's I guess they'll want to fix it; if it's containerd's they'll want to advocate and explain why 😅

boskoop commented 2 years ago

Can confirm as well on kubernetes v1.24.1/weave 2.8.1 where downgrading to containerd.io=1.5.11-1 solved the issue too.

busyboy77 commented 2 years ago

and I am sure until they find who's fault was it, people will forget weave altogether on Kubernetes, my guess after seeing that weave was last updated about an year ago

kingdonb commented 2 years ago

If there is someone who has the chops and wants to see Weave net maintained, send your PRs @ me and I will help you to try and get them merged.

My understanding from https://github.com/containerd/containerd/issues/6921#issuecomment-1146680225 is that this all works again, thanks to a change from upstream which has resolved the backwards-incompatible changes in CNI.

Which means of course, people can install weave net again (and they might be in danger with no maintainers actively pushing out releases.)

(Edit: the discussion in https://github.com/weaveworks/weave/pull/3939 is a good place to start if you haven't seen it yet.)

rajch commented 2 years ago

Meanwhile, I can confirm that weave net works as-is on containerd 1.6.6, with Kubernetes 1.24, 1.23 and 1.22.