Closed hhrutter closed 6 years ago
$ kubectl logs -n kube-system weave-net-4mkc5 weave FATA: 2018/07/28 22:18:53.116928 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout Failed to get peers
Weave pods are unable to reach Kubernetes API server through service proxy. Very likely you are running into routing issues and nothing related to Weave as such. Please take a look at below routes and see if this is what causing the issue.
192.168.100.0/24 dev enp0s3 proto kernel scope link src 192.168.100.1
Routing table indicate nodes are reachable through enp0s3
default via 10.0.3.2 dev enp0s8 proto dhcp metric 100
Since there is no explicit route for service IP range IP 10.96.0.0/12 are getting routed through enp0s8. My guess is when weave pods reach 10.96.0.1:443
, service proxy DNAT's destination IP to 192.168.100.1
which is sent over enp0s3 which should result in packet drops on master node.
Please see that documentation https://kubernetes.io/docs/setup/independent/install-kubeadm/#check-network-adapters for the requirement and issue for similar context.
I think the problem is that the DNAT translation in the Linux kernel happens after src IP addr is chosen for the packet. In your case, the master receives the request with src IP set to the one used as a src to the default gw (ip route get 10.96.0.1
), and therefore the response is sent via enp0s8
instead of enp0s3
(or just dropped if the rp_filter
policy is enabled) => the packet is lost.
To fix it, you can try adding a route on each worker node with ip route add 10.96.0.1/32 dev enp0s3 src $IP_ADDR_OF_enp0s3
.
In our own Vagrant based devbox setup (Saltstack based but that's just fyi) we've worked around a couple of quirks (which includes this issue) by:
--pod-network-cidr 10.32.0.0/12
to the kubeadm init
command and use https://cloud.weave.works/k8s/net?k8s-version=v1.11.1&env.IPALLOC_RANGE=10.32.0.0/12
as the URL for fetching the weave.yaml or directly applying it like you did. What you pass doesn't really matter as long as it's the same range in both cases.That'll fix the issue that the API server isn't reachable because the kube-proxy cannot distinguish between the pod IPs and the service IPs.
The next issue you'll run into with a Vagrant setup is probably that your nodes all report 10.0.2.15 as internal IP to the API server (do a kubectl describe node
and look at the output). That can be fixed by choosing predictable worker IPs (for example starting with 192.168.100.10) and adding that as an extra argument to the kubelet's systemd configuration. For us Saltstack creates /etc/systemd/system/kubelet.service.d/20-kubelet-node-ip.conf
and sets
[Service]
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.100.10"
as the content. A reload of the unit is required but it depends on what and when the configuration is created.
It took us a lot of time to debug these issues but we finally have a stable and reproducible multi node Vagrant cluster.
A final note about debugging the Weave pods. We saved the weave.yaml file instead of directly applying it and replaced the livenessProbe
with a readinessProbe
in the weave-net
DaemonSet to stop K8s from constantly reaping the Weave pods. It doesn't change the outcome (node wasn't ready before and it's not after) but it makes debugging a lot easier.
That'll fix the issue that the API server isn't reachable because the kube-proxy cannot distinguish between the pod IPs and the service IPs.
Specifying pod-network-cidr
, which results in clusterCIDR
getting specified for kube-proxy, may have been accidentally fixing the problem. Basically its masking the problem by MASQUERADE
the traffic.
Weave net run in hostNetwork
, so it is not going to get ip from pod CIDR anyway. In case of multiple interfaces its still important to establish proper routes so that traffic to service VIP's get router properly.
You're right that this might be an accidental fix but for now it's easier to do than making sure ip route add 10.96.0.1/32 …
is done on all nodes and is reboot safe regardless of the OS used for the cluster. As this is a Vagrant setup used for development and it does work I'm fine with it for now. Do you have a less brittle proposal?
Maybe change your Vagrant setup so the default route is also the route to the api-server ?
There are some different suggestions in the lengthy thread at https://github.com/kubernetes/kubeadm/issues/102
I'm not aware of anyone attacking the fundamental issue that Linux doesn't reconsider its choice of source address after a DNAT.
Thanks for all the interesting input.
Although I am still trying to understand why I need this in my particular setup
adding a static route on my worker nodes withip route add 10.96.0.1/32 dev enp0s3
fixed the problem and all nodes are up and running including the weave-net pods.
How’d you make that reboot safe? The route will only be there temporarily.
On Aug 5, 2018, at 5:58 PM, Horst Rutter notifications@github.com wrote:
Closed #3363.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I persisted the static routes on the worker nodes via network script like so: /etc/sysconfig/network-scripts/route-enp0s3: 10.96.0.1/32 via 192.168.100.1 dev enp0s3
In case this helps others, I had the exact same setup as described in the issue, but adding the route did not help. I fixed the problem by recreating the cluster but this time I specified the pod CIDR block in the kubeadm init
with --pod-network-cidr
(I chose 10.244.0.0/16 but you should be able to pick anything that doesn't overlap the other interfaces), and it worked. Specifying the pod CIDR is sufficient.
cc/ @annismckenzie
What you expected to happen?
After installation of a k8s cluster Weave should be up and running on all nodes after
kubeadm join
What happened?
kubeadm init --apiserver-advertise-address=192.168.100.1
then I copied admin.conf to $HOME/.kube/configkubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
then i joined 2 nodes with:the 2 worker nodes don't come up:
How to reproduce it?
HostOS: MacOS 10.13.6 using VirtualBox 5.2.16 GuestOS. 3 x CentOS 7.5
master: 192.168.100.1
node1: 192.168.100.2
node2: 192.168.100.3
Anything else we need to know?
I can provide anything - just let me know.
Versions:
Logs:
Network: