rajch / weave

Simple, resilient multi-host containers networking and more.
https://rajch.github.io/weave/
Apache License 2.0
45 stars 8 forks source link

k8s部署weave插件报错 #16

Open zz0350 opened 1 week ago

zz0350 commented 1 week ago

Expected Outcome: pod/weave-net-lxzfx 2/2 Running 7 (26h ago) 3d3h

Actual Outcome:

cyx@node2:/$ kubectl get all -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-7db6d8ff4d-b2v92 1/1 Running 34 (26h ago) 32d pod/coredns-7db6d8ff4d-fkffx 1/1 Running 34 (26h ago) 32d pod/etcd-node2 1/1 Running 62 (26h ago) 32d pod/kube-apiserver-node2 1/1 Running 58 (26h ago) 32d pod/kube-controller-manager-node2 1/1 Running 38 (26h ago) 32d pod/kube-proxy-22bmm 1/1 Running 0 54m pod/kube-proxy-gzwbw 1/1 Running 3 (26h ago) 3d3h pod/kube-proxy-kbw84 1/1 Running 1 (35m ago) 61m pod/kube-proxy-phjgc 1/1 Running 0 55m pod/kube-proxy-pq6rx 1/1 Running 0 54m pod/kube-proxy-rtplg 1/1 Running 0 54m pod/kube-proxy-vbdtp 1/1 Running 5 (2d3h ago) 4d23h pod/kube-proxy-wghts 1/1 Running 35 (26h ago) 32d pod/kube-scheduler-node2 1/1 Running 39 (26h ago) 32d pod/weave-net-5rfjt 2/2 Running 72 (26h ago) 32d pod/weave-net-7cz9p 1/2 CrashLoopBackOff 15 (2m31s ago) 54m pod/weave-net-cswgw 1/2 CrashLoopBackOff 15 (118s ago) 54m pod/weave-net-fr6s9 1/2 CrashLoopBackOff 20 (4m20s ago) 61m pod/weave-net-lxzfx 2/2 Running 7 (26h ago) 3d3h pod/weave-net-mcrzw 2/2 Running 16 (2d3h ago) 4d23h pod/weave-net-q59j8 1/2 CrashLoopBackOff 15 (2m34s ago) 54m pod/weave-net-wgh9w 1/2 CrashLoopBackOff 15 (2m12s ago) 55m Reproduction: In my cluster, there are already 2 worker nodes (with the weave plugin working correctly) and one control plane (with weave working correctly). However, the weave plugin on the subsequent 5 worker nodes all went into CrashLoopBackOff.

Hardware: The 5 worker nodes added later are all Raspberry Pi 4 Model B, while the first 3 successful ones included one Raspberry Pi 4 Model B.

Weave Configuration: Applied directly using kubectl apply -f net.yaml. Kubernetes version is 1.30.

Weave version:

INFO: 2024/11/09 13:32:15.456124 weave 2.8.9 Docker: For the first 3 successful ones: Docker 27.0.3, installed from a package. For the next 5 failed ones: Docker 27.3.1, installed using the apt repository.

System Information:

uname -a: (false) Linux node5 5.15.0-1061-raspi weaveworks#64-Ubuntu SMP PREEMPT Wed Aug 7 14:41:30 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux (succeed) Linux node2 6.8.0-1013-raspi #14-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 15:14:53 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux (succeed) Linux node3 6.8.0-1013-raspi #14-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 15:14:53 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux (false) Linux node11 5.15.0-1061-raspi weaveworks#64-Ubuntu SMP PREEMPT Wed Aug 7 14:41:30 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux Kubectl version:

cyx@node2:/$ kubectl version Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.0

cyx@node3:/proc/sys/net/core$ kubectl version Client Version: v1.30.6 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 error: unable to parse the server version: invalid character '<' looking for beginning of value

cyx@node5:/etc$ kubectl version Client Version: v1.30.6 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 error: unable to parse the server version: invalid character '<' looking for beginning of value

cyx@node11:~/docker_images$ kubectl version Client Version: v1.30.6 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 error: unable to parse the server version: invalid character '<' looking for beginning of value Weave Logs:

cyx@node2:/$ kubectl logs weave-net-7cz9p -n kube-system Defaulted container "weave" out of: weave, weave-npc, weave-init (init) iptables backend mode: nft DEBU: 2024/11/09 13:47:28.706109 [kube-peers] Checking peer "ee:78:5a:ee:17:61" against list &{[{4e:ea:7b:ce:a4:2d node2} {6a:b7:c0:3c:ac:fb node4} {6e:1c:d3:1c:2a:0a node3}]} Peer not in list; removing persisted data INFO: 2024/11/09 13:47:29.528736 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=8 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:ee:78:5a:ee:17:61 nickname:node10 no-dns:true no-masq-local:true port:6783] INFO: 2024/11/09 13:47:29.528893 weave 2.8.9 FATA: 2024/11/09 13:47:29.847512 creating dummy interface: operation not supported

cyx@node2:/$ kubectl logs weave-net-7cz9p -n kube-system weave iptables backend mode: nft DEBU: 2024/11/09 13:47:28.706109 [kube-peers] Checking peer "ee:78:5a:ee:17:61" against list &{[{4e:ea:7b:ce:a4:2d node2} {6a:b7:c0:3c:ac:fb node4} {6e:1c:d3:1c:2a:0a node3}]} Peer not in list; removing persisted data INFO: 2024/11/09 13:47:29.528736 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=8 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:ee:78:5a:ee:17:61 nickname:node10 no-dns:true no-masq-local:true port:6783] INFO: 202

rajch commented 1 week ago

From what I can make out, weave is crashing on five nodes while trying to add a dummy link for setting the bridge MTU. Before we try to diagnose the root cause, please help me confirm the problem.

  1. Please run kubectl get pods -n kube-system -o wide and paste the results here. This will show the nodes where the weave pods are failing.
  2. Next, choose any two failing weave pods, and run kubectl logs -n kube-system <pod name> weave on both of them, and paste the results here.
  3. On the two nodes whose weave logs you pasted in the last step, please run lsmod, and paste the results here, indicating which node is which.

Thanks in advance, and sorry for the trouble.

zz0350 commented 1 week ago

1.kubectl get pods -n kube-system -o wide: NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-7db6d8ff4d-b2v92 1/1 Running 35 (36h ago) 33d 10.32.0.8 node2 coredns-7db6d8ff4d-fkffx 1/1 Running 35 (36h ago) 33d 10.32.0.7 node2 etcd-node2 1/1 Running 63 (36h ago) 33d 192.168.3.60 node2 kube-apiserver-node2 1/1 Running 59 (36h ago) 33d 192.168.3.60 node2 kube-controller-manager-node2 1/1 Running 39 (36h ago) 33d 192.168.3.60 node2 kube-proxy-22bmm 1/1 Running 1 (13m ago) 38h 192.168.3.109 node7 kube-proxy-gzwbw 1/1 Running 4 (36h ago) 4d16h 192.168.3.71 node3 kube-proxy-kbw84 1/1 Running 2 (13m ago) 38h 192.168.3.104 node5 kube-proxy-phjgc 1/1 Running 1 (13m ago) 38h 192.168.3.106 node11 kube-proxy-pq6rx 1/1 Running 1 (13m ago) 38h 192.168.3.107 node10 kube-proxy-rtplg 1/1 Running 1 (13m ago) 38h 192.168.3.110 node12 kube-proxy-vbdtp 1/1 Running 5 (3d16h ago) 6d13h 192.168.3.70 node4 kube-proxy-wghts 1/1 Running 36 (36h ago) 33d 192.168.3.60 node2 kube-scheduler-node2 1/1 Running 40 (36h ago) 33d 192.168.3.60 node2 weave-net-5rfjt 2/2 Running 74 (36h ago) 33d 192.168.3.60 node2 weave-net-7cz9p 1/2 CrashLoopBackOff 29 (116s ago) 38h 192.168.3.107 node10 weave-net-cswgw 1/2 CrashLoopBackOff 28 (2m52s ago) 38h 192.168.3.110 node12 weave-net-fr6s9 1/2 CrashLoopBackOff 34 (112s ago) 38h 192.168.3.104 node5 weave-net-lxzfx 2/2 Running 10 (26m ago) 4d16h 192.168.3.71 node3 weave-net-mcrzw 2/2 Running 16 (3d16h ago) 6d13h 192.168.3.70 node4 weave-net-q59j8 1/2 CrashLoopBackOff 28 (2m5s ago) 38h 192.168.3.109 node7 weave-net-wgh9w 1/2 CrashLoopBackOff 28 (114s ago) 38h 192.168.3.106 node11

2.1: cyx@node2:~/kubernetes-pod-config/prometheus_grafana$ kubectl logs weave-net-7cz9p -n kube-system weave iptables backend mode: nft DEBU: 2024/11/11 02:40:32.150834 [kube-peers] Checking peer "f6:c8:44:17:68:ab" against list &{[{4e:ea:7b:ce:a4:2d node2} {6a:b7:c0:3c:ac:fb node4} {6e:1c:d3:1c:2a:0a node3}]} Peer not in list; removing persisted data INFO: 2024/11/11 02:40:32.753009 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=8 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:f6:c8:44:17:68:ab nickname:node10 no-dns:true no-masq-local:true port:6783] INFO: 2024/11/11 02:40:32.753151 weave 2.8.9 FATA: 2024/11/11 02:40:33.082800 creating dummy interface: operation not supported

2.2: cyx@node2:~/kubernetes-pod-config/prometheus_grafana$ kubectl logs weave-net-q59j8 -n kube-system weave iptables backend mode: nft DEBU: 2024/11/11 02:40:24.011697 [kube-peers] Checking peer "de:9d:0f:2e:82:e7" against list &{[{4e:ea:7b:ce:a4:2d node2} {6a:b7:c0:3c:ac:fb node4} {6e:1c:d3:1c:2a:0a node3}]} Peer not in list; removing persisted data INFO: 2024/11/11 02:40:24.444343 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true http-addr:127.0.0.1:6784 ipalloc-init:consensus=8 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:de:9d:0f:2e:82:e7 nickname:node7 no-dns:true no-masq-local:true port:6783] INFO: 2024/11/11 02:40:24.444485 weave 2.8.9 FATA: 2024/11/11 02:40:24.756510 creating dummy interface: operation not supported

3.1 cyx@node10:~$ lsmod Module Size Used by rpcsec_gss_krb5 40960 0 auth_rpcgss 151552 1 rpcsec_gss_krb5 nfsv4 872448 1 nfs 425984 2 nfsv4 lockd 110592 1 nfs grace 16384 1 lockd fscache 417792 1 nfs netfs 49152 1 fscache vport_vxlan 16384 0 vxlan 81920 1 vport_vxlan ip6_udp_tunnel 20480 1 vxlan udp_tunnel 28672 1 vxlan openvswitch 176128 2 vport_vxlan nsh 16384 1 openvswitch nf_conncount 24576 1 openvswitch ip_set_hash_net 57344 1 xt_physdev 16384 2 nfnetlink_log 24576 1 xt_statistic 20480 3 xt_nat 16384 11 xt_tcpudp 20480 11 xt_mark 16384 5 ip_set_hash_ip 49152 21 iptable_filter 16384 0 bpfilter 16384 0 xt_set 20480 21 ip_set 57344 3 ip_set_hash_ip,xt_set,ip_set_hash_net br_netfilter 32768 0 xt_comment 16384 107 xt_conntrack 16384 26 nft_chain_nat 16384 6 xt_MASQUERADE 20480 3 nf_nat 49152 4 xt_nat,openvswitch,nft_chain_nat,xt_MASQUERADE bridge 319488 1 br_netfilter stp 20480 1 bridge llc 20480 2 bridge,stp nf_conntrack_netlink 53248 0 nf_conntrack 184320 7 xt_conntrack,nf_nat,xt_nat,openvswitch,nf_conntrack_netlink,nf_conncount,xt_MASQUERADE nf_defrag_ipv6 24576 2 nf_conntrack,openvswitch nf_defrag_ipv4 16384 1 nf_conntrack nft_counter 16384 154 xt_addrtype 16384 5 nft_compat 20480 194 nf_tables 258048 361 nft_compat,nft_counter,nft_chain_nat nfnetlink 20480 7 nft_compat,nf_conntrack_netlink,nf_tables,ip_set,nfnetlink_log cmac 16384 3 algif_hash 24576 1 algif_skcipher 20480 1 af_alg 32768 6 algif_hash,algif_skcipher bnep 32768 2 hci_uart 155648 1 btqca 24576 1 hci_uart btrtl 24576 1 hci_uart btbcm 28672 1 hci_uart btintel 45056 1 hci_uart overlay 155648 10 sunrpc 622592 9 nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs binfmt_misc 24576 1 btsdio 20480 0 bluetooth 716800 30 btrtl,btqca,btsdio,btintel,hci_uart,btbcm,bnep ecdh_generic 16384 2 bluetooth ecc 36864 1 ecdh_generic bcm2835_codec 53248 0 bcm2835_isp 36864 0 bcm2835_v4l2 49152 0 v4l2_mem2mem 45056 1 bcm2835_codec bcm2835_mmal_vchiq 40960 3 bcm2835_codec,bcm2835_v4l2,bcm2835_isp videobuf2_dma_contig 24576 2 bcm2835_codec,bcm2835_isp videobuf2_vmalloc 20480 1 bcm2835_v4l2 videobuf2_memops 20480 2 videobuf2_vmalloc,videobuf2_dma_contig videobuf2_v4l2 32768 4 bcm2835_codec,bcm2835_v4l2,v4l2_mem2mem,bcm2835_isp brcmfmac 417792 0 videobuf2_common 81920 8 bcm2835_codec,videobuf2_vmalloc,videobuf2_dma_contig,videobuf2_v4l2,bcm2835_v4l2,v4l2_mem2mem,videobuf2_memops,bcm2835_isp snd_bcm2835 36864 0 brcmutil 28672 1 brcmfmac videodev 282624 6 bcm2835_codec,videobuf2_v4l2,bcm2835_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp cfg80211 966656 1 brcmfmac snd_pcm 163840 1 snd_bcm2835 snd_timer 45056 1 snd_pcm mc 73728 6 videodev,bcm2835_codec,videobuf2_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp raspberrypi_hwmon 16384 0 snd 126976 3 snd_bcm2835,snd_timer,snd_pcm vc_sm_cma 40960 2 bcm2835_mmal_vchiq,bcm2835_isp bcm2835_gpiomem 16384 0 rpivid_mem 16384 0 nvmem_rmem 16384 0 uio_pdrv_genirq 20480 0 uio 24576 1 uio_pdrv_genirq sch_fq_codel 20480 2 drm 647168 0 ip_tables 36864 1 iptable_filter x_tables 57344 13 xt_conntrack,xt_statistic,iptable_filter,nft_compat,xt_tcpudp,xt_addrtype,xt_physdev,xt_nat,xt_comment,xt_set,ip_tables,xt_MASQUERADE,xt_mark autofs4 49152 2 btrfs 1613824 0 blake2b_generic 24576 0 zstd_compress 229376 1 btrfs raid10 73728 0 raid456 196608 0 async_raid6_recov 24576 1 raid456 async_memcpy 20480 2 raid456,async_raid6_recov async_pq 20480 2 raid456,async_raid6_recov async_xor 20480 3 async_pq,raid456,async_raid6_recov async_tx 20480 5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov xor 20480 2 async_xor,btrfs xor_neon 16384 1 xor raid6_pq 114688 4 async_pq,btrfs,raid456,async_raid6_recov libcrc32c 16384 6 nf_conntrack,nf_nat,openvswitch,btrfs,nf_tables,raid456 raid1 53248 0 raid0 24576 0 multipath 24576 0 linear 20480 0 spidev 24576 0 dwc2 315392 0 crct10dif_ce 20480 1 roles 20480 1 dwc2 udc_core 77824 1 dwc2 i2c_bcm2835 20480 0 spi_bcm2835 28672 0 xhci_pci 24576 0 xhci_pci_renesas 24576 1 xhci_pci phy_generic 20480 1 aes_arm64 16384 3

3.2: cyx@node7:~$ lsmod Module Size Used by rpcsec_gss_krb5 40960 0 auth_rpcgss 151552 1 rpcsec_gss_krb5 nfsv4 872448 1 nfs 425984 2 nfsv4 lockd 110592 1 nfs grace 16384 1 lockd fscache 417792 1 nfs netfs 49152 1 fscache vport_vxlan 16384 0 vxlan 81920 1 vport_vxlan ip6_udp_tunnel 20480 1 vxlan udp_tunnel 28672 1 vxlan openvswitch 176128 2 vport_vxlan nsh 16384 1 openvswitch nf_conncount 24576 1 openvswitch ip_set_hash_net 57344 1 xt_physdev 16384 2 nfnetlink_log 24576 1 xt_statistic 20480 3 xt_nat 16384 11 xt_tcpudp 20480 11 xt_mark 16384 5 ip_set_hash_ip 49152 21 iptable_filter 16384 0 bpfilter 16384 0 xt_set 20480 21 ip_set 57344 3 ip_set_hash_ip,xt_set,ip_set_hash_net br_netfilter 32768 0 xt_comment 16384 107 xt_conntrack 16384 26 nft_chain_nat 16384 6 xt_MASQUERADE 20480 3 nf_nat 49152 4 xt_nat,openvswitch,nft_chain_nat,xt_MASQUERADE bridge 319488 1 br_netfilter stp 20480 1 bridge llc 20480 2 bridge,stp nf_conntrack_netlink 53248 0 nf_conntrack 184320 7 xt_conntrack,nf_nat,xt_nat,openvswitch,nf_conntrack_netlink,nf_conncount,xt_MASQUERADE nf_defrag_ipv6 24576 2 nf_conntrack,openvswitch nf_defrag_ipv4 16384 1 nf_conntrack nft_counter 16384 154 xt_addrtype 16384 5 nft_compat 20480 194 nf_tables 258048 361 nft_compat,nft_counter,nft_chain_nat nfnetlink 20480 7 nft_compat,nf_conntrack_netlink,nf_tables,ip_set,nfnetlink_log cmac 16384 3 algif_hash 24576 1 algif_skcipher 20480 1 af_alg 32768 6 algif_hash,algif_skcipher bnep 32768 2 hci_uart 155648 1 btqca 24576 1 hci_uart btrtl 24576 1 hci_uart btbcm 28672 1 hci_uart btintel 45056 1 hci_uart overlay 155648 10 sunrpc 622592 9 nfsv4,auth_rpcgss,lockd,rpcsec_gss_krb5,nfs binfmt_misc 24576 1 btsdio 20480 0 bluetooth 716800 30 btrtl,btqca,btsdio,btintel,hci_uart,btbcm,bnep ecdh_generic 16384 2 bluetooth ecc 36864 1 ecdh_generic brcmfmac 417792 0 bcm2835_codec 53248 0 bcm2835_isp 36864 0 bcm2835_v4l2 49152 0 v4l2_mem2mem 45056 1 bcm2835_codec brcmutil 28672 1 brcmfmac bcm2835_mmal_vchiq 40960 3 bcm2835_codec,bcm2835_v4l2,bcm2835_isp snd_bcm2835 36864 0 videobuf2_vmalloc 20480 1 bcm2835_v4l2 videobuf2_dma_contig 24576 2 bcm2835_codec,bcm2835_isp videobuf2_memops 20480 2 videobuf2_vmalloc,videobuf2_dma_contig videobuf2_v4l2 32768 4 bcm2835_codec,bcm2835_v4l2,v4l2_mem2mem,bcm2835_isp cfg80211 966656 1 brcmfmac snd_pcm 163840 1 snd_bcm2835 videobuf2_common 81920 8 bcm2835_codec,videobuf2_vmalloc,videobuf2_dma_contig,videobuf2_v4l2,bcm2835_v4l2,v4l2_mem2mem,videobuf2_memops,bcm2835_isp snd_timer 45056 1 snd_pcm videodev 282624 6 bcm2835_codec,videobuf2_v4l2,bcm2835_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp raspberrypi_hwmon 16384 0 mc 73728 6 videodev,bcm2835_codec,videobuf2_v4l2,videobuf2_common,v4l2_mem2mem,bcm2835_isp snd 126976 3 snd_bcm2835,snd_timer,snd_pcm vc_sm_cma 40960 2 bcm2835_mmal_vchiq,bcm2835_isp bcm2835_gpiomem 16384 0 rpivid_mem 16384 0 uio_pdrv_genirq 20480 0 nvmem_rmem 16384 0 uio 24576 1 uio_pdrv_genirq sch_fq_codel 20480 2 drm 647168 0 ip_tables 36864 1 iptable_filter x_tables 57344 13 xt_conntrack,xt_statistic,iptable_filter,nft_compat,xt_tcpudp,xt_addrtype,xt_physdev,xt_nat,xt_comment,xt_set,ip_tables,xt_MASQUERADE,xt_mark autofs4 49152 2 btrfs 1613824 0 blake2b_generic 24576 0 zstd_compress 229376 1 btrfs raid10 73728 0 raid456 196608 0 async_raid6_recov 24576 1 raid456 async_memcpy 20480 2 raid456,async_raid6_recov async_pq 20480 2 raid456,async_raid6_recov async_xor 20480 3 async_pq,raid456,async_raid6_recov async_tx 20480 5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov xor 20480 2 async_xor,btrfs xor_neon 16384 1 xor raid6_pq 114688 4 async_pq,btrfs,raid456,async_raid6_recov libcrc32c 16384 6 nf_conntrack,nf_nat,openvswitch,btrfs,nf_tables,raid456 raid1 53248 0 raid0 24576 0 multipath 24576 0 linear 20480 0 spidev 24576 0 dwc2 315392 0 roles 20480 1 dwc2 udc_core 77824 1 dwc2 i2c_bcm2835 20480 0 crct10dif_ce 20480 1 spi_bcm2835 28672 0 xhci_pci 24576 0 xhci_pci_renesas 24576 1 xhci_pci phy_generic 20480 1 aes_arm64 16384 3

zz0350 commented 1 week ago

about lsmod | grep -E "vxlan|veth|bridge": node2 & node3: veth 36864 0 vport_vxlan 12288 1 vxlan 139264 1 vport_vxlan ip6_udp_tunnel 16384 1 vxlan udp_tunnel 32768 1 vxlan openvswitch 192512 3 vport_vxlan bridge 372736 1 br_netfilter stp 12288 1 bridge llc 16384 2 bridge,stp

node5 & node11 & node10 & node7 & node12: vport_vxlan 16384 0 vxlan 81920 1 vport_vxlan ip6_udp_tunnel 20480 1 vxlan udp_tunnel 28672 1 vxlan openvswitch 176128 2 vport_vxlan bridge 319488 1 br_netfilter stp 20480 1 bridge llc 20480 2 bridge,stp

rajch commented 1 week ago

Looking at your system configuration, it looks like the successful nodes are running Linux kernel version 6.8.0, and the unsuccessful ones are running version 5.15.0. And so there are some modules missing.

Try running sudo apt install linux-modules-extra-raspi on the failing nodes, and rebooting.

This apt package is not required from Kernel 6.7.0 onwards (), but needed before that (see the last answer on this thread).

System Information:

uname -a:
(false) Linux node5 5.15.0-1061-raspi https://github.com/weaveworks/weave/pull/64-Ubuntu SMP PREEMPT Wed Aug 7 14:41:30 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
(succeed) Linux node2 6.8.0-1013-raspi https://github.com/rajch/weave/issues/14-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 15:14:53 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
(succeed) Linux node3 6.8.0-1013-raspi https://github.com/rajch/weave/issues/14-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 15:14:53 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
(false) Linux node11 5.15.0-1061-raspi https://github.com/weaveworks/weave/pull/64-Ubuntu SMP PREEMPT Wed Aug 7 14:41:30 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
zz0350 commented 1 week ago

thank you,but,i have run the "sudo apt update;sudo apt upgrade;sudo apt install linux-modules-extra-raspi" but it shows: Taking backup of spi2-1cs.dtbo. Installing new spi2-1cs.dtbo. Taking backup of w5500.dtbo. Installing new w5500.dtbo. Taking backup of cap1106.dtbo. Installing new cap1106.dtbo. Taking backup of minipitft13.dtbo. Installing new minipitft13.dtbo. Taking backup of README. Installing new README. Scanning processes... Scanning processor microcode... Scanning linux images...

Failed to check for processor microcode upgrades.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.

and finally, i reboot the system, it seems that the result of "uname -a"change from "Linux node11 5.15.0-1064-raspi #67-Ubuntu SMP PREEMPT Tue Oct 1 19:46:40 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux" to "Linux node5 5.15.0-1065-raspi #68-Ubuntu SMP PREEMPT Tue Oct 15 19:25:34 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux".maybe i have to reinstall the OS?

rajch commented 1 week ago

Okay, that should have worked. Have you tried sudo apt dist-upgrade or sudo apt full-upgrade ?

rajch commented 1 week ago

Specifically, it seems that the veth.ko module is missing on the non-functional nodes. It should have been installed when you install linux-modules-extra-raspi. See this thread for example.

Upgrading the OS should also help, because you no longer need a separate package for later versions of the kernel.

I hope either approach worked for you.