Closed carlosedp closed 6 years ago
I am trying to do clean installation using same commands. The result is
% kubectl logs -n kube-system -c weave weave-net-gkltc
Failed to get peers
Now I am trying to different version.
@zetaab I had similar problems while deploying Kubernetes where I needed to reset it all between installs. I made a GIST with the commands to reset it all: https://gist.github.com/carlosedp/5040f4a1b2c97c1fa260a3409b5f14f9
Start from line 10 for resetting Weave.
this has worked like one year, but now the ansible configuration is broken :(
@carlosedp Thanks for the issue.
The error FATA: 2018/03/15 15:38:37.870839 Existing bridge type "bridge" is different than requested "bridged_fastdp". Please do 'weave reset' and try again
indicates that before the update you were running Weave with fastdp disabled. Did you do it by intention? If not, do you have logs of weave-kube before the update?
I did a weave reset in all nodes, pods went to Running but all communication to my application pods were lost (although weave status showed that the nodes were connected.
It's expected that existing connections to pods are lost after you reset Weave Net. However, rebooting machine is not required to re-connect. Does your client applications implement any connection re-try mechanism?
@zetaab Could you please open a separate issue?
@brb, I haven't disabled fastdp, just deployed using the default manifests. They are connecting in sleeve mode by default:
$ weave status connections
<- 192.168.1.56:41907 established sleeve 86:82:1a:3d:5b:64(kubenode2) mtu=1438
<- 192.168.1.55:45352 established sleeve a6:b4:8e:f4:d3:1c(kubenode1) mtu=1438
-> 192.168.1.50:6783 failed cannot connect to ourself, retry: never
Maybe any missing parameter on my Kernel? I'm running on ARM64 platform:
$ uname -a
Linux kubemaster1 4.4.77-rockchip-ayufan-136 #1 SMP Thu Oct 12 09:14:48 UTC 2017 aarch64 GNU/Linux
I found that my Kernel does not have OVS modules compiled. Will build them and try again to see if fastdp gets enabled.
Just updated the kernel on my nodes to one that contains the openvswitch module and now the weave nodes connect on fastdp mode:
rock64@kubemaster1:~ (kubearm:kube-system) $ weave status connections
-> 192.168.1.55:6783 established fastdp a6:b4:8e:f4:d3:1c(kubenode1) mtu=1376
<- 192.168.1.56:34049 established fastdp 86:82:1a:3d:5b:64(kubenode2) mtu=1376
-> 192.168.1.50:6783 failed cannot connect to ourself, retry: never
I will follow up if the problem happens again. Will close the issue.
@carlosedp Thanks for the update.
I find it strange that previously weave has created the bridge of the bridged_fastdp
type even if your machine didn't have the required OVS modules.
@carlosedp Which kernel did you update to?
@jmreicha I compiled myself the kernel 4.4.114 for Pine64 Rock64 boards from Ayufan's repository (https://github.com/ayufan-rock64/linux-build). Added the modules to the config. It's pretty stable for a while. No hangs, freezes or dumps. Maybe his latest versions are stable like this but I don't want to change what's working :)
$ uname -a
Linux kubemaster1 4.4.114-rockchip-ayufan-1 #1 SMP Thu Mar 22 16:02:29 UTC 2018 aarch64 GNU/Linux
$ uptime
11:27:12 up 40 days, 14:59, 1 user, load average: 4.22, 4.73, 5.60
@carlosedp Interesting, what version of Kubernetes are you running? I was having some stability issues on Kubernetes 1.10.x and newer kernels.
K8s 1.9.7. Tried to update but faced some timeout problems with Kubeadm. I had stability issues and kernel dumps before but since this Kernel it's very stable. Not sure what fixed tho. In case you want to try, I've uploaded the files to: https://we.tl/ivPbhswOY2
@carlosedp Awesome thanks 🎉
What you expected to happen?
Have Weave Net pods updated
What happened?
I fetched the newest manifest to update from 2.2.0 to 2.2.1 using:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After this, the Weave pods in the cluster restarted and went to CrashLoopBackOff. The logs asked for a Weave reset.
I did a
weave reset
in all nodes, pods went to Running but all communication to my application pods were lost (althoughweave status
showed that the nodes were connected. I needed a full reboot in all nodes since the rules were wiped.How to reproduce it?
This happened two times when updating Weave pods.
Versions:
Logs:
Pod description:
Pod Logs:
Network:
IPTables after update: