rarely some pod cannot reach some virtual ip ( happened 2 times ) // arp table incomplete ?

fvigotti commented 5 years ago

weave 2.5.0 kube 1.12.2 Docker version 18.03.1-ce, build 9ee9f40 ubuntu 18.04 kernel : 4.15.0-43-generic cluster of ~15 nodes

after a couple of months of runtime whole production cluster went down some weeks ago, I was more worried to bring everything up than debugging but a fast inspection of the issue then showed that cluster dns service was unreachable also inspection on pods ( with weave network ) or hosts showed same problem, nodes can ping eachother but some VIP was unreachable from nodes..

restarted all weave pods , everything started working again.. now inspecting issues on some zookeeper nodes I've found that again... 2 nodes in the cluster cannot reach the cluster dns VIP, I can ping the service VIP or the pod VIP other nodes, but those two cannot

--- 10.34.40.31 ping statistics ---
501 packets transmitted, 0 received, +432 errors, 100% packet loss, time 511559ms

........ from another node in the cluster : 

--- 10.34.40.31 ping statistics ---
509 packets transmitted, 508 received, 0% packet loss, time 508111ms
rtt min/avg/max/mdev = 94.530/94.973/101.274/0.507 ms

weave logs show nothing strange/different between hosts.. nor those commands..

weave status connections 
weave --local status connections
weave ps
weave status peers
weave status ipam
ip route
arp

iptables seems fine

on working node


iptables -t nat -nvL | grep '10.34.40.31'
0     0 KUBE-MARK-MASQ  all  --  *      *       10.34.40.31          0.0.0.0/0           
0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp to:10.34.40.31:53
0     0 KUBE-MARK-MASQ  all  --  *      *       10.34.40.31          0.0.0.0/0           
1   101 DNAT       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp to:10.34.40.31:53


- on failing node:

iptables -t nat -nvL | grep '10.34.40.31' 0 0 KUBE-MARK-MASQ all -- 10.34.40.31 0.0.0.0/0
0 0 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp to:10.34.40.31:53 0 0 KUBE-MARK-MASQ all -- 10.34.40.31 0.0.0.0/0
0 0 DNAT udp -- 0.0.0.0/0 0.0.0.0/0 udp to:10.34.40.31:53


** IMPORTANT ** I've tested other cluster ip randomly.. and those others IP works on all nodes, seems that just this ( or maybe other ip that I've not found  ( I've just tested ( with ping )  ~10 pods VIP ip ) ) VIP doesn't work ( un/fortunately it's the core-dns VIP so the problem has been evident ) 

now, if it's a bug.. I think that I can restart weave when it occurr .. the only thing that I need to find is a test for the healthcheck to automate the restart... ( obviusly pinging everything from every weave pod is not feasible )

one thing that I've found is that while pinging that infamous VIP, the node that have problems show this in ARP table :

10.34.40.31 (incomplete) weave


while the working nodes show this instead..

10.34.40.31 ether a2:31:cb:dc:3e:98 C weave



this is the only evidence I've found for the issue..

**URGENCY** now..  this is a production cluster , I'm ok to keep the things down for some hours while further investigating.. but then I'll have to bring it up again.. I'm asking help to inspect the possible issue here.. tell me commands and I'll try.. ( maybe just don't ask me too much logs that I'll have to obfuscate from public ips/hostnames , this is a public production cluster)

thank you,
Francesco

**ERRATA:
I thought my weave version was 2.5.1, I've checked .. the client I've used for inspection is in that version but pods use the 2.5.0 indeed.. 
so this https://github.com/weaveworks/weave/pull/3442
should have been merged anyway...

fvigotti commented 5 years ago

another element that maybe can help...

when from the non working node , I stop pinging the non-reachable VIP, the arp table remove the (incomplete) problem and show the correct HWADDRESS .... :sob:


root@bca-1:~# arp | grep "40.31"
10.34.40.31              ether   a2:31:cb:dc:3e:98   C                     weave
root@bca-1:~# arp | grep "40.31"
10.34.40.31              ether   a2:31:cb:dc:3e:98   C                     weave
# --- now I start again to ping 10.34.40.31 .... and ... 
root@bca-1:~# arp | grep "40.31"
10.34.40.31                      (incomplete)                              weave
root@bca-1:~# arp | grep "40.31"
10.34.40.31                      (incomplete)                              weave
root@bca-1:~# arp | grep "40.31"

murali-reddy commented 5 years ago

2 nodes in the cluster cannot reach the cluster dns VIP, I can ping the service VIP or the pod VIP other nodes, but those two cannot

@fvigotti Are you able to access the pod IP of the DNS service directly from the nodes where you are seeing the problem? or the problem is only through accessing the service VIP?

From the problem node accessing any other service work fine?

fvigotti commented 5 years ago

@murali-reddy from the node where there is the problem & from the pod on those nodes I cannot access the VIP of POD && I cannot access the VIP of the service ( maybe the problem is that it redirect to that pod.. I don't know how to check connection to the service ip before the redirection take place ) , I can access the public ip of the node where coredns pod is deployed... from every node ( included those with the problem with VIPs )

IMPORTANT anyway.. further testing inspired by your message .. evidenced that ALL the pod VIPs deployed on ONE specific node ( the one where there is coredns ) are not reachable from those two problematic nodes ( and all pod deployed there ) ( those with the arp incomplete )

fvigotti commented 5 years ago

because the problematic nodes are two and the "unreachable node" ( unreachables are the pods VIPs on that node ) is only one, I'll then restart the weave on one of the two problematic nodes and post if the issue fixes on that node.. so if the problem is the "weave server" or "weave client" but before that I need help to investigate about what may be the cause, about the discoverability of the issue, I can just ping a globally distributed pod/daemonset that doesn't use hostnetwork to their VIPs and with failures in ping I can discover those weave services that are failing...

fvigotti commented 5 years ago

I want to add that new pods deployed on the node that is "unreachable" are unreachable too on their VIPs ( from those two problematic nodes )

fvigotti commented 5 years ago

I have had to restart those nodes to bring some services up ( this is a production cluster ) anyway I've restarted the weave pod in one of the node that cannot ping, after ~20 seconds it was able to ping the previously unreachable VIPs

then I've restarted weave on the node whose VIPs were not reachable... after ~30 seconds the non-restarted problematic node.. started working again.. so the problem has been fixed either restarting the "src" OR "dst" weave of those unreachable VIPs...

murali-reddy commented 5 years ago

@fvigotti I assume your cluster is stable now, but if you run into issue again, for troubleshooting please check if nodes are able to establish connections by running weave status connections and weave status peers on both the nodes which can not talk to each other

fvigotti commented 5 years ago

@murali-reddy I won't call it stable.. because this problem happened twice (in 3 months of those nodes uptime ) and I don't know what was the cause and also at the moment the auto-healing is a hacked crontab job on all the nodes .. with something like this :

ips=`kubectl get pods --namespace=kube-system -o=jsonpath="{..status.podIP}" -l name=weave-monitor-pong`
for ip in $ips; do 
echo "ip : $ip"
ping $ip -c 1  2>&1 >/dev/null
if [ $? = 0 ]; then 
echo ok 
else
echo ERROR
fi
done

to find problems then there it is a restart logic..

as I said in my first post those commands show 0 errors/issues.. ( their output is the same as it is now )
all connections was established encrypted fastdp even to/from the node whose VIPs was unreachable..

... just a small difference.. with weave status peers the arrows from the problematic node was some right sided some left.. now are all poiting to the right

weaveworks / weave

rarely some pod cannot reach some virtual ip ( happened 2 times ) // arp table incomplete ? #3626