weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 670 forks source link

kubernetes 1.19 / weave-net 2.7.0 : "Segmentation fault (core dumped)" on ARMv7 #3860

Open grunlab opened 4 years ago

grunlab commented 4 years ago

What you expected to happen?

Weave-net still working after upgrading it from 2.6.4 to 2.7.0 on kubernetes 1.19 running on ARMv7 nodes.

What happened?

After upgrade, "weave" container in pod "weave-net" in CrashLoopBackOff state

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

kubectl logs weave-net-2854t -c weave
Segmentation fault (core dumped)

Network:

iptables, ip6tables, arptables & ebtables are running in legacy mode

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy

It was already the case before trying to upgrade weave-net

grunlab commented 4 years ago

Up Thank you

grunlab commented 3 years ago

Hi

I did some updates since the issue opening:

But I'm still not able to upgrade weave from 2.6.5 to 2.7.0 :-(

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
kubectl get pod -n kube-system | grep CrashLoopBackOff
weave-net-8xp6b                     1/2     CrashLoopBackOff   5          4m22s

The status of the two containers inside weave-net pod is the following:

Do you need additional information? I would be very happy to help. Thank you

thewilli commented 3 years ago

The same happens for me with Asus Tinker Board S (ARMv7 as well). v2.6.4 works while v2.7.0 ends in Segmentation fault (core dumped). Nodes running on x86-64 are not affected from the issue.

byte13 commented 3 years ago

Hi,

Same issue with weave-kube on 2.7.0 on :

$ uname -a Linux host1 5.4.0-1022-raspi #25-Ubuntu SMP PREEMPT Thu Oct 15 14:22:53 UTC 2020 armv7l armv7l armv7l GNU/Linux

Ubuntu 20.04 LTS is said to be certified on Raspberry Pi :

Weave 2.6.5 seems running : $ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-66bff467f8-pknkx 1/1 Running 4 27d kube-system coredns-66bff467f8-rsdwn 1/1 Running 4 27d kube-system etcd-winuxpi3.lab.byte13.org 1/1 Running 52 5d2h kube-system kube-apiserver-winuxpi3.lab.byte13.org 1/1 Running 48 5d2h kube-system kube-controller-manager-winuxpi3.lab.byte13.org 1/1 Running 52 5d2h kube-system kube-proxy-vlpbw 1/1 Running 4 27d kube-system kube-scheduler-winuxpi3.lab.byte13.org 1/1 Running 53 5d2h kube-system weave-net-dzcfn 2/2 Running 0 9m53s

grunlab commented 3 years ago

I've just upgraded to 2.8.1 ... no more issue with this version on ARMv7.

sudo podman images | grep weave
docker.io/weaveworks/weave-npc            2.8.1            7f92d556d4ff  9 hours ago    39.7 MB
docker.io/weaveworks/weave-kube           2.8.1            df29c0a4002c  9 hours ago    89.8 MB
kubectl get pod -n kube-system -o wide | grep weave
weave-net-4gwzl                     2/2     Running   0          62m   192.168.0.105   worker-02   <none>           <none>
weave-net-5255p                     2/2     Running   0          58m   192.168.0.103   master-03   <none>           <none>
weave-net-5n9sw                     2/2     Running   0          55m   192.168.0.101   master-01   <none>           <none>
weave-net-6fgl8                     2/2     Running   0          63m   192.168.0.106   worker-03   <none>           <none>
weave-net-7vqrh                     2/2     Running   0          56m   192.168.0.104   worker-01   <none>           <none>
weave-net-bkc7v                     2/2     Running   0          57m   192.168.0.102   master-02   <none>           <none>
weave-net-bpbgr                     2/2     Running   0          60m   192.168.0.107   worker-04   <none>           <none>
weave-net-r25f5                     2/2     Running   0          61m   192.168.0.108   worker-05   <none>           <none>
weave-net-vmhwk                     2/2     Running   0          65m   192.168.0.113   worker-10   <none>           <none>

You can close this issue