Closed Tatsu-Kishi closed 4 years ago
I see two issues. First there are heartbeat misses. Under normal circumstances this should not have happened.
INFO: 2019/03/20 08:36:21.533420 overlay_switch ->[ae:3f:91:18:2e:cb(k8sm1)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/20 08:36:21.534529 overlay_switch ->[9e:34:c5:bd:b8:9e(k8sw2)] sleeve timed out waiting for UDP heartbeat
Second, looks like fastdp
forwarder is not getting used, and only functional forwarder sleeve
having timeouts resulting in connection shutdown.
INFO: 2019/03/20 08:36:21.533536 ->[192.168.100.72:6783|ae:3f:91:18:2e:cb(k8sm1)]: connection shutting down due to error: no working forwarders to ae:3f:91:18:2e:cb(k8sm1)
INFO: 2019/03/20 08:36:21.534565 ->[192.168.100.83:6783|9e:34:c5:bd:b8:9e(k8sw2)]: connection shutting down due to error: no working forwarders to 9e:34:c5:bd:b8:9e(k8sw2)
Could you please enable DEBUG logging and share the logs when you encounter this issue again?
It happened again, so here are the weave outputs in failure state:
# weave status
Version: 2.5.1 (up to date; next check at 2019/03/20 14:13:47)
Service: router
Protocol: weave 1..2
Name: f2:e1:3e:28:f1:80(k8sw1)
Encryption: disabled
PeerDiscovery: enabled
Targets: 3
Connections: 3 (1 established, 1 pending, 1 failed)
Peers: 3 (with 3 established, 3 pending connections)
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
# weave status peers
f2:e1:3e:28:f1:80(k8sw1)
<- 192.168.100.72:42967 ae:3f:91:18:2e:cb(k8sm1) pending
<- 192.168.100.83:50687 9e:34:c5:bd:b8:9e(k8sw2) pending
ae:3f:91:18:2e:cb(k8sm1)
-> 192.168.100.83:6783 9e:34:c5:bd:b8:9e(k8sw2) established
-> 192.168.100.79:6783 f2:e1:3e:28:f1:80(k8sw1) pending
9e:34:c5:bd:b8:9e(k8sw2)
<- 192.168.100.72:38745 ae:3f:91:18:2e:cb(k8sm1) established
-> 192.168.100.79:6783 f2:e1:3e:28:f1:80(k8sw1) pending
# weave status connections
<- 192.168.100.72:56677 pending fastdp ae:3f:91:18:2e:cb(k8sm1) mtu=1500
-> 192.168.100.83:6783 established sleeve 9e:34:c5:bd:b8:9e(k8sw2) mtu=1438
-> 192.168.100.79:6783 failed cannot connect to ourself, retry: never
Will enable debug logging now too.
Probably noteworthy is that there is heavy UDP traffic going on already in the cluster. Is there maybe a parameter to do the heartbeat over TCP?
Probably noteworthy is that there is heavy UDP traffic going on already in the cluster. Is there maybe a parameter to do the heartbeat over TCP?
Its over UDP only. Also weave considers a hearbeat miss only on consecutive(hard-coded to be 6) misses. DEBUG logs should help to see whats going on.
weave.log Here is the log of it happening again (first 150 lines and a bit before it actually happened) And the weave status of it at the time:
# weave status
Version: 2.5.1 (up to date; next check at 2019/03/21 14:48:42)
Service: router
Protocol: weave 1..2
Name: f2:e1:3e:28:f1:80(k8sw1)
Encryption: disabled
PeerDiscovery: enabled
Targets: 3
Connections: 3 (1 established, 1 pending, 1 failed)
Peers: 3 (with 3 established, 3 pending connections)
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
# weave status peers
f2:e1:3e:28:f1:80(k8sw1)
<- 192.168.100.83:57165 9e:34:c5:bd:b8:9e(k8sw2) pending
-> 192.168.100.72:6783 ae:3f:91:18:2e:cb(k8sm1) established
ae:3f:91:18:2e:cb(k8sm1)
-> 192.168.100.83:6783 9e:34:c5:bd:b8:9e(k8sw2) established
<- 192.168.100.79:33711 f2:e1:3e:28:f1:80(k8sw1) pending
9e:34:c5:bd:b8:9e(k8sw2)
<- 192.168.100.72:50645 ae:3f:91:18:2e:cb(k8sm1) established
-> 192.168.100.79:6783 f2:e1:3e:28:f1:80(k8sw1) pending
I think we're hitting the same issue. Symptoms look identical.
We have nodes with these log chunks repeating over and over:
On weave-net-nq27f 3/3 Running 2 7d 10.186.0.157 kubernetes-kubernetes-cr2-2-1547767704
:
INFO: 2019/03/26 10:43:19.850007 ->[10.186.0.187:32815] connection accepted
INFO: 2019/03/26 10:43:19.851079 ->[10.186.0.187:32815|f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704)]: connection ready; using protocol version 2
INFO: 2019/03/26 10:43:19.851282 overlay_switch ->[f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704)] using fastdp
INFO: 2019/03/26 10:43:19.851336 ->[10.186.0.187:32815|f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704)]: connection shutting down due to error: Multiple connections to f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704) added to 6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)
On weave-net-hh9rj 3/3 Running 27 7d 10.186.0.187 kubernetes-kubernetes-cr1-13-1547767704
:
INFO: 2019/03/26 10:43:19.848875 ->[10.186.0.157:6783] attempting connection
INFO: 2019/03/26 10:43:19.850807 ->[10.186.0.157:6783|6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)]: connection ready; using protocol version 2
INFO: 2019/03/26 10:43:19.850973 overlay_switch ->[6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)] using fastdp
INFO: 2019/03/26 10:43:19.851002 ->[10.186.0.157:6783|6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)]: connection added
INFO: 2019/03/26 10:43:19.853848 Setting up IPsec between f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704) and 6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)
INFO: 2019/03/26 10:43:19.854323 ipsec: InitSALocal: 10.186.0.157 -> 10.186.0.187 :6784 0x73b96332
ERRO: 2019/03/26 10:43:19.876556 fastdp ->[10.186.0.157:6784|6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)]: ipsec init SA local failed: send InitSARemote: write tcp4 10.186.0.187:32815->10.186.0.157:6783: write: broken pipe
INFO: 2019/03/26 10:43:19.876658 ->[10.186.0.157:6783|6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)]: connection shutting down due to error: write tcp4 10.186.0.187:32815->10.186.0.157:6783: write: connection reset by peer
INFO: 2019/03/26 10:43:19.876719 ->[10.186.0.157:6783|6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)]: connection deleted
INFO: 2019/03/26 10:43:19.876735 Destroying IPsec between f6:de:0b:60:9f:17(kubernetes-kubernetes-cr1-13-1547767704) and 6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)
INFO: 2019/03/26 10:43:19.876802 ipsec: destroy: in 10.186.0.157 -> 10.186.0.187 0x73b96332
INFO: 2019/03/26 10:43:19.877741 overlay_switch ->[6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)] fastdp send InitSARemote: write tcp4 10.186.0.187:32815->10.186.0.157:6783: write: broken pipe
INFO: 2019/03/26 10:43:19.877776 overlay_switch ->[6e:f2:36:34:7d:df(kubernetes-kubernetes-cr2-2-1547767704)] using sleeve
We've experienced the same error a couple of times now, each time hitting a node that hosts a Traefik load balancer instance. We're seeing the same pattern of this block repeating:
INFO: 2019/05/06 03:50:54.920197 ->[10.153.42.237:6783|4e:d9:88:c1:94:67(ip-10-153-42-237.us-east-2.compute.internal)]: connection ready; using protocol version 2
INFO: 2019/05/06 03:50:54.920259 overlay_switch ->[4e:d9:88:c1:94:67(ip-10-153-42-237.us-east-2.compute.internal)] using fastdp
INFO: 2019/05/06 03:50:54.920282 ->[10.153.42.237:6783|4e:d9:88:c1:94:67(ip-10-153-42-237.us-east-2.compute.internal)]: connection shutting down due to error: Multiple connections to 4e:d9:88:c1:94:67(ip-10-153-42-237.us-east-2.compute.internal) added to ca:c7:e5:c7:be:80(ip-10-153-17-191.us-east-2.compute.internal)
We're seeing the same log messages as @mikebryant . In each case rebooting/terminating the node resolves the issue but restarting Weave on the node without doing that does not.
The symptoms are that the Weave Net appears to be otherwise healthy except that all connections are reported as "sleeve" (all good nodes report "fastdp") and all packets that would transit the Weave network are dropped with "no route to host". iptables rules appear to be normal and consistent with good nodes.
I've attached reports generated by both the unhealthy node and a health node for comparison. Hopefully they will help shed some light on this:
I also enabled debug logging so hopefully I have some more concrete information to report if this happens again.
Hello,
Is there a workarround or a version that is not affected by this bug ?
Common theme across the shared logs is connection reset by peer
. I tried to reproduce the scenario by blocking ports 6783/784 and reopening the ports again with not much luck. I am able to reproduce connection shutting down due to error: Multiple connections
but connections gets reestablished to fastdp. I will investigate further.
Is there a workarround or a version that is not affected by this bug ?
@jntakpe We dont the root cause yet to suggest a work around.
we faced the same issue too. logs repeated over again and again
INFO: 2019/06/06 11:50:55.443926 ->[10.201.20.164:6783|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection deleted
INFO: 2019/06/06 11:50:55.446266 ->[10.201.20.164:6783] attempting connection
INFO: 2019/06/06 11:50:55.475459 ->[10.201.20.164:6783|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection ready; using protocol version 2
INFO: 2019/06/06 11:50:55.475560 overlay_switch ->[96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)] using fastdp
INFO: 2019/06/06 11:50:55.475601 ->[10.201.20.164:6783|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection added
INFO: 2019/06/06 11:50:55.541047 overlay_switch ->[5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/06/06 11:50:55.553389 ->[10.201.20.164:27939] connection accepted
INFO: 2019/06/06 11:50:55.640593 ->[10.201.20.164:27939|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection ready; using protocol version 2
INFO: 2019/06/06 11:50:55.640704 overlay_switch ->[96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)] using fastdp
INFO: 2019/06/06 11:50:55.640750 ->[10.201.20.164:27939|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection shutting down due to error: Multiple connections to 96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal) added to 72:7c:ed:95:b2:24(ip-10-201-21-112.ec2.internal)
INFO: 2019/06/06 11:50:56.079073 ->[10.201.20.164:6783|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: connection fully established
INFO: 2019/06/06 11:50:56.552727 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/06/06 11:50:56.561926 sleeve ->[10.201.20.164:6783|96:d4:22:2a:65:00(ip-10-201-20-164.ec2.internal)]: Effective MTU verified at 8939
INFO: 2019/06/06 11:51:11.578086 Discovered remote MAC ba:38:97:c2:5a:48 at 46:9c:5b:6c:20:0f(ip-10-201-19-177.ec2.internal)
INFO: 2019/06/06 11:51:21.137827 ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection shutting down due to error: Inconsistent entries for 100.109.64.0: owned by 3a:0f:18:e2:50:99 but incoming message says b6:fd:31:4c:3c:7c
INFO: 2019/06/06 11:51:21.137938 ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection deleted
INFO: 2019/06/06 11:51:21.144054 ->[10.201.20.113:6783] attempting connection
INFO: 2019/06/06 11:51:21.154210 ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection ready; using protocol version 2
INFO: 2019/06/06 11:51:21.154344 overlay_switch ->[5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)] using fastdp
INFO: 2019/06/06 11:51:21.154382 ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection added
INFO: 2019/06/06 11:51:22.132283 ->[10.201.20.113:52959] connection accepted
INFO: 2019/06/06 11:51:22.238391 ->[10.201.20.113:52959|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection ready; using protocol version 2
INFO: 2019/06/06 11:51:22.238521 overlay_switch ->[5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)] using fastdp
INFO: 2019/06/06 11:51:22.238587 ->[10.201.20.113:52959|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection shutting down due to error: Multiple connections to 5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal) added to 72:7c:ed:95:b2:24(ip-10-201-21-112.ec2.internal)
INFO: 2019/06/06 11:51:22.817703 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/06/06 11:51:22.817869 overlay_switch ->[5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)] using sleeve
INFO: 2019/06/06 11:51:22.817913 ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: connection fully established
INFO: 2019/06/06 11:51:22.824401 sleeve ->[10.201.20.113:6783|5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)]: Effective MTU verified at 8939
INFO: 2019/06/06 11:51:22.838396 overlay_switch ->[5a:68:a6:8e:36:f5(ip-10-201-20-113.ec2.internal)] using fastdp
INFO: 2019/06/06 11:51:25.244635 ->[10.201.19.246:6783|f6:25:32:b6:96:86(ip-10-201-19-246.ec2.internal)]: connection shutting down due to error: read tcp4 10.201.21.112:33567->10.201.19.246:6783: read: connection reset by peer
any workaround to this ? I'm using weave 2.5.2 on kubernetes version 1.11.10 and seeing these errors continuously. In addition, I'm also seeing these errors
INFO: 2019/07/04 07:23:33.520512 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/04 06:52:56.549384 sleeve ->[10.11.3.170:6783|ee:10:f8:5f:7a:b0(ip-10-11-3-170.ap-southeast-1.compute.internal)]: Effective MTU verified at 8939
I'm using default MTU value = 8912. Does this mean my MTU configuration is incorrect ? Do I need to update it ?
I'm using weave 2.5.2 on kubernetes version 1.11.10 and seeing these errors continuously.
Please share your logs to reason why you are seeing this error connection shutting down due to error: Multiple connections
.
Note that original issue is case where there no working forwarders found, and there were heartbeat timeouts etc. Please see if you have similar symptoms else open a new issue with relevant logs.
Does this mean my MTU configuration is incorrect ? Do I need to update it ?
Regarding MTU, are you using sleeve
mode? In case of sleeve mode MTU is dynamically negotiated. If you were using fastdp you should check why connection drooped to sleeve mode. Either way sharing your logs would help.
@murali-reddy sure. let me share the log lines. Also i'm using fastdp mode only let me know if you need any other logs/info
INFO: 2019/07/04 07:23:33.520512 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/04 06:52:56.549384 sleeve ->[10.11.3.170:6783|ee:10:f8:5f:7a:b0(ip-10-11-3-170.ap-southeast-1.compute.internal)]: Effective MTU verified at 8939
INFO: 2019/07/04 07:30:13.986939 ->[10.11.3.24:6783|8e:4a:6b:b5:b5:e0(ip-10-11-3-24.ap-southeast-1.compute.internal)]: connection shutting down due to error: cannot connect to ourself
INFO: 2019/07/04 07:20:25.694554 ->[10.11.4.147:6783] error during connection attempt: dial tcp4 :0->10.11.4.147:6783: connect: connection refused
INFO: 2019/07/04 07:22:47.667506 ->[10.11.4.147:6783] error during connection attempt: dial tcp4 :0->10.11.4.147:6783: connect: connection timed out
INFO: 2019/07/04 07:23:32.962780 ->[10.11.4.124:6783|be:25:ec:67:b2:53(ip-10-11-4-124.ap-southeast-1.compute.internal)]: connection shutting down due to error: Multiple connections to be:25:ec:67:b2:53(ip-10-11-4-124.ap-southeast-1.compute.internal) added to 8e:4a:6b:b5:b5:e0(ip-10-11-3-24.ap-southeast-1.compute.internal)
weave status
Use 'kubectl describe pod/weave-net-s7fhk -n kube-system' to see all of the containers in this pod.
/home/weave # ./weave --local status
Version: 2.5.2 (up to date; next check at 2019/07/04 13:56:41)
Service: router
Protocol: weave 1..2
Name: 8e:4a:6b:b5:b5:e0(ip-10-11-3-24.ap-southeast-1.compute.internal)
Encryption: disabled
PeerDiscovery: enabled
Targets: 54
Connections: 54 (53 established, 1 failed)
Peers: 54 (with 2862 established connections)
TrustedSubnets: none
Service: ipam
Status: ready
Range: 100.96.0.0/11
DefaultSubnet: 100.96.0.0/11
weave connections status
/home/weave # ./weave --local status connections
<- 10.11.3.186:42093 established fastdp de:4e:e2:52:09:5e(ip-10-11-3-186.ap-southeast-1.compute.internal) mtu=8912
-> 10.11.3.199:6783 established fastdp 2e:07:5a:86:47:db(ip-10-11-3-199.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.242:16539 established fastdp 4e:57:b6:df:bf:2e(ip-10-11-4-242.ap-southeast-1.compute.internal) mtu=8912
-> 10.11.3.193:6783 established fastdp 7a:00:13:3d:7f:37(ip-10-11-3-193.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.76:55588 established fastdp f6:45:18:0f:2f:9b(ip-10-11-4-76.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.3.177:37937 established fastdp be:78:1e:60:68:c6(ip-10-11-3-177.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.116:20028 established fastdp 7e:d4:ca:29:3d:b6(ip-10-11-4-116.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.96:47457 established fastdp 6e:40:11:fd:69:b7(ip-10-11-4-96.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.3.137:63090 established fastdp 7a:34:9d:f6:4a:92(ip-10-11-3-137.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.3.139:17730 established fastdp 06:3c:c0:54:34:bc(ip-10-11-3-139.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.225:18892 established fastdp ca:00:4e:48:ed:7c(ip-10-11-4-225.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.3.197:58850 established fastdp c6:45:7e:43:1e:06(ip-10-11-3-197.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.4.103:43112 established fastdp ca:ff:3f:f2:d5:c2(ip-10-11-4-103.ap-southeast-1.compute.internal) mtu=8912
<- 10.11.3.142:18248 established fastdp 62:27:64:05:d2:8a(ip-10-11-3-142.ap-southeast-1.compute.internal) mtu=8912
What OS are you guys using?
We've encountered tons of connection reset by peer
issues, and I've finally traced it back to networkd
usage on CoreOS
(2079.5.1 as of this writing, but have been running various permutations since ~600). K8s 1.11.10, Weave 2.5.2 at this time.
See #3: https://www.weave.works/blog/running-a-weave-network-on-coreos/
Since I'm using Kops, adding a drop-in to install this file:
## /etc/systemd/network/10-weave.network
[Match]
Name=weave datapath vxlan-6784 dummy0
[Network]
Description=Network interfaces managed by weave
[Link]
Unmanaged=true
Seems to have helped.
[edit] Made the example drop-in more explicit
@mars64 thanks for the input, but this resolution is for connection reset by peers
error. What about the following errors ? Also, how do i determine correct MTU configuration for weave ? There are multiple hosts on my host - docker0, eth0, number of vethwep*
with different MTU values.
INFO: 2019/07/04 07:30:13.986939 ->[10.11.3.24:6783|8e:4a:6b:b5:b5:e0(ip-10-11-3-24.ap-southeast-1.compute.internal)]: connection shutting down due to error: cannot connect to ourself
INFO: 2019/07/04 07:23:32.962780 ->[10.11.4.124:6783|be:25:ec:67:b2:53(ip-10-11-4-124.ap-southeast-1.compute.internal)]: connection shutting down due to error: Multiple connections to be:25:ec:67:b2:53(ip-10-11-4-124.ap-southeast-1.compute.internal) added to 8e:4a:6b:b5:b5:e0(ip-10-11-3-24.ap-southeast-1.compute.internal)
I'm saying that I encountered many connection reset by peers
errors that were tough to track down and my solution above seems to be helping.
Re: MTU, please see @murali-reddy's comment on your other issue.
Have same issue: Kubernetes: Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", weave 2.5.2 We use ubuntu 18.04, with WireGuard vpn between nodes, it works fine in case of 3 nodes, now we have 7 nodes and this issue happens almost every day.
Logs:
INFO: 2019/07/17 08:00:00.460067 Sending ICMP 3,4 (10.42.0.0 -> 10.47.0.20): PMTU=1358
INFO: 2019/07/17 08:00:30.040359 Sending ICMP 3,4 (10.36.0.18 -> 10.47.0.2): PMTU=1358
INFO: 2019/07/17 08:00:30.040614 Sending ICMP 3,4 (10.36.0.18 -> 10.47.0.2): PMTU=1358
INFO: 2019/07/17 08:00:30.040948 Sending ICMP 3,4 (10.36.0.18 -> 10.47.0.2): PMTU=1358
INFO: 2019/07/17 08:00:30.041070 Sending ICMP 3,4 (10.36.0.18 -> 10.47.0.2): PMTU=1358
INFO: 2019/07/17 08:00:31.643604 Discovered remote MAC ce:e1:74:78:e9:99 at aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 07:59:50.232678 ->[10.0.1.4:42333|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:50.232777 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:50.232799 ->[10.0.1.4:42333|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 07:59:50.233947 ->[10.0.1.4:42333|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: read tcp4 10.0.1.3:6783->10.0.1.4:42333: read: connection reset by peer
INFO: 2019/07/17 07:59:50.234049 ->[10.0.1.4:42333|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 07:59:50.234168 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:50.234457 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:50.234505 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:00.401502 Discovered remote MAC 46:5d:b8:73:d2:f8 at ee:8c:04:f1:3d:5b(kube7)
INFO: 2019/07/17 08:00:31.649026 Discovered remote MAC ce:e1:74:78:e9:99 at aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 07:59:53.678243 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 07:59:53.684373 ->[10.0.1.4:58107|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:53.684468 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:53.684498 ->[10.0.1.4:58107|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 07:59:53.707558 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:53.708448 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:53.708495 ->[10.0.1.4:58107|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 07:59:53.708609 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 07:59:53.709189 ->[10.0.1.4:58107|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to aa:4f:55:0d:c9:e8(kube1)
ERRO: 2019/07/17 08:00:00.400693 Captured frame from MAC (02:f0:18:50:dd:f7) to (46:5d:b8:73:d2:f8) associated with another peer 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 07:59:54.783518 ->[10.0.1.4:48885|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 22:49:f6:fc:cc:cd(kube6)
ERRO: 2019/07/17 08:00:00.400535 Captured frame from MAC (02:f0:18:50:dd:f7) to (46:5d:b8:73:d2:f8) associated with another peer 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:05.039007 Sending ICMP 3,4 (10.42.0.0 -> 10.43.128.21): PMTU=1358
INFO: 2019/07/17 08:00:31.649188 Discovered remote MAC ce:e1:74:78:e9:99 at aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 08:00:34.145086 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 08:00:34.145652 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 08:00:34.145809 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 08:00:34.145912 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 07:59:58.323908 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:58.324035 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:34.145993 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 08:00:34.146112 Sending ICMP 3,4 (10.40.0.12 -> 10.43.128.2): PMTU=1358
INFO: 2019/07/17 07:59:58.324066 ->[10.0.1.4:54443|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:58.324152 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:58.324180 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 07:59:58.324860 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 07:59:58.325021 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 07:59:58.325317 ->[10.0.1.4:54443|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:00.399926 Discovered remote MAC 46:5d:b8:73:d2:f8 at ee:8c:04:f1:3d:5b(kube7)
INFO: 2019/07/17 08:00:31.646942 Discovered remote MAC ce:e1:74:78:e9:99 at aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 07:59:58.325698 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using fastdp
INFO: 2019/07/17 07:59:58.325746 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using fastdp
INFO: 2019/07/17 07:59:58.325763 ->[10.0.1.5:44479|02:f0:18:50:dd:f7(kube5)]: connection added (new peer)
INFO: 2019/07/17 07:59:58.325992 ->[10.0.1.5:44479|02:f0:18:50:dd:f7(kube5)]: connection deleted
INFO: 2019/07/17 07:59:58.326093 ->[10.0.1.5:44479|02:f0:18:50:dd:f7(kube5)]: connection shutting down due to error: Multiple connections to 02:f0:18:50:dd:f7(kube5) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 07:59:58.326866 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection added (new peer)
INFO: 2019/07/17 07:59:58.331304 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using sleeve
INFO: 2019/07/17 07:59:58.331743 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/17 07:59:58.331905 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection fully established
INFO: 2019/07/17 07:59:58.334000 sleeve ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: Effective MTU verified at 1358
INFO: 2019/07/17 07:59:50.216976 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 07:59:50.218344 ->[10.0.1.4:46457] connection accepted
INFO: 2019/07/17 07:59:50.219728 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:50.220004 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:50.220073 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 07:59:50.220176 ->[10.0.1.4:46457|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 07:59:50.220294 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 07:59:50.220727 ->[10.0.1.4:46457|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 86:0a:f7:72:aa:a2(kube2)
ERRO: 2019/07/17 08:00:00.400779 Captured frame from MAC (02:f0:18:50:dd:f7) to (46:5d:b8:73:d2:f8) associated with another peer 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:31.648688 Discovered remote MAC ce:e1:74:78:e9:99 at aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 08:00:42.857008 Discovered remote MAC e6:3b:5b:d9:e6:a4 at 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:42.862008 Discovered remote MAC e6:3b:5b:d9:e6:a4 at 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:42.862272 Discovered remote MAC e6:3b:5b:d9:e6:a4 at 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:42.862243 Discovered remote MAC e6:3b:5b:d9:e6:a4 at 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:48.395014 Sending ICMP 3,4 (10.42.0.22 -> 10.47.0.3): PMTU=1358
INFO: 2019/07/17 08:00:48.395202 Sending ICMP 3,4 (10.42.0.22 -> 10.47.0.3): PMTU=1358
INFO: 2019/07/17 08:00:50.221401 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: write kg tcp4 10.0.1.2:51739->10.0.1.4:6783: write: connection reset by peer
INFO: 2019/07/17 08:00:50.220229 overlay_switch ->[86:0a:f7:72:aa:a2(kube2)] using sleeve
INFO: 2019/07/17 08:00:50.220413 overlay_switch ->[86:0a:f7:72:aa:a2(kube2)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:50.220505 ->[10.0.1.2:51739|86:0a:f7:72:aa:a2(kube2)]: connection shutting down due to error: no working forwarders to 86:0a:f7:72:aa:a2(kube2)
INFO: 2019/07/17 08:00:50.220673 ->[10.0.1.2:51739|86:0a:f7:72:aa:a2(kube2)]: connection deleted
INFO: 2019/07/17 08:00:50.221292 ->[10.0.1.2:6783] attempting connection
INFO: 2019/07/17 08:00:50.221449 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:50.221847 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:50.222207 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:50.227493 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:50.235191 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:50.235592 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:50.235999 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: no working forwarders to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:50.233175 ->[10.0.1.2:42041] connection accepted
INFO: 2019/07/17 08:00:50.235406 overlay_switch ->[6e:1f:12:00:ea:d2(kube3)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:50.235492 ->[10.0.1.3:36577|6e:1f:12:00:ea:d2(kube3)]: connection shutting down due to error: no working forwarders to 6e:1f:12:00:ea:d2(kube3)
INFO: 2019/07/17 08:00:50.235611 ->[10.0.1.3:36577|6e:1f:12:00:ea:d2(kube3)]: connection deleted
INFO: 2019/07/17 08:00:50.235855 ->[10.0.1.3:6783] attempting connection
INFO: 2019/07/17 08:00:50.239181 ->[10.0.1.3:35709] connection accepted
INFO: 2019/07/17 08:00:50.239438 ->[10.0.1.4:36293] connection accepted
INFO: 2019/07/17 08:00:50.236275 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:50.236856 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:50.238397 ->[10.0.1.4:60201] connection accepted
INFO: 2019/07/17 08:00:50.239553 ->[10.0.1.4:60201|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.239654 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:50.239697 ->[10.0.1.4:60201|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.243002 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.243155 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:50.243783 ->[10.0.1.4:60201|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:50.243841 ->[10.0.1.4:60201|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 6e:1f:12:00:ea:d2(kube3)
INFO: 2019/07/17 08:00:50.244937 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.241887 ->[10.0.1.4:36293|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.241982 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:50.242011 ->[10.0.1.4:36293|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.239771 ->[10.0.1.3:6783|6e:1f:12:00:ea:d2(kube3)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.239928 overlay_switch ->[6e:1f:12:00:ea:d2(kube3)] using fastdp
INFO: 2019/07/17 08:00:50.239957 ->[10.0.1.3:6783|6e:1f:12:00:ea:d2(kube3)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.241573 ->[10.0.1.3:35709|6e:1f:12:00:ea:d2(kube3)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.241667 overlay_switch ->[6e:1f:12:00:ea:d2(kube3)] using fastdp
INFO: 2019/07/17 08:00:50.241697 ->[10.0.1.3:6783|6e:1f:12:00:ea:d2(kube3)]: connection deleted
INFO: 2019/07/17 08:00:50.241829 ->[10.0.1.3:6783|6e:1f:12:00:ea:d2(kube3)]: connection shutting down due to error: Multiple connections to 6e:1f:12:00:ea:d2(kube3) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:50.241910 ->[10.0.1.3:35709|6e:1f:12:00:ea:d2(kube3)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.243326 ->[10.0.1.2:42041|86:0a:f7:72:aa:a2(kube2)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.243436 ->[10.0.1.2:6783|86:0a:f7:72:aa:a2(kube2)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.243483 overlay_switch ->[86:0a:f7:72:aa:a2(kube2)] using fastdp
INFO: 2019/07/17 08:00:50.243511 ->[10.0.1.2:42041|86:0a:f7:72:aa:a2(kube2)]: connection added (new peer)
INFO: 2019/07/17 08:00:50.243529 overlay_switch ->[86:0a:f7:72:aa:a2(kube2)] using fastdp
INFO: 2019/07/17 08:00:50.243641 ->[10.0.1.2:6783|86:0a:f7:72:aa:a2(kube2)]: connection shutting down due to error: Multiple connections to 86:0a:f7:72:aa:a2(kube2) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:50.243571 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:50.243692 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:50.243733 ->[10.0.1.4:36293|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:50.243819 ->[10.0.1.4:36293|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 86:0a:f7:72:aa:a2(kube2)
INFO: 2019/07/17 08:00:50.243897 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:53.709085 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:53.710034 overlay_switch ->[aa:4f:55:0d:c9:e8(kube1)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:53.710180 ->[10.0.1.1:59301|aa:4f:55:0d:c9:e8(kube1)]: connection shutting down due to error: no working forwarders to aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 08:00:53.710324 ->[10.0.1.1:59301|aa:4f:55:0d:c9:e8(kube1)]: connection deleted
INFO: 2019/07/17 08:00:53.709944 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:53.710019 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: no working forwarders to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:53.710177 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:53.710537 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:53.712846 ->[10.0.1.4:50223] connection accepted
INFO: 2019/07/17 08:00:53.714692 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:53.710938 ->[10.0.1.1:6783] attempting connection
INFO: 2019/07/17 08:00:53.712821 ->[10.0.1.1:59375] connection accepted
INFO: 2019/07/17 08:00:53.714244 ->[10.0.1.1:6783|aa:4f:55:0d:c9:e8(kube1)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:53.715008 ->[10.0.1.4:50223|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:53.715126 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:53.715197 ->[10.0.1.4:50223|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:53.715572 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:53.715722 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to aa:4f:55:0d:c9:e8(kube1)
INFO: 2019/07/17 08:00:53.714418 overlay_switch ->[aa:4f:55:0d:c9:e8(kube1)] using fastdp
INFO: 2019/07/17 08:00:53.714524 ->[10.0.1.1:6783|aa:4f:55:0d:c9:e8(kube1)]: connection added (new peer)
INFO: 2019/07/17 08:00:53.714549 ->[10.0.1.1:59375|aa:4f:55:0d:c9:e8(kube1)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:53.714669 overlay_switch ->[aa:4f:55:0d:c9:e8(kube1)] using fastdp
INFO: 2019/07/17 08:00:53.714819 ->[10.0.1.1:59375|aa:4f:55:0d:c9:e8(kube1)]: connection shutting down due to error: Multiple connections to aa:4f:55:0d:c9:e8(kube1) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:53.717910 overlay_switch ->[aa:4f:55:0d:c9:e8(kube1)] using sleeve
INFO: 2019/07/17 08:00:53.717963 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/17 08:00:53.717984 ->[10.0.1.1:6783|aa:4f:55:0d:c9:e8(kube1)]: connection fully established
INFO: 2019/07/17 08:00:53.720227 sleeve ->[10.0.1.1:6783|aa:4f:55:0d:c9:e8(kube1)]: Effective MTU verified at 1358
INFO: 2019/07/17 08:00:54.782490 overlay_switch ->[22:49:f6:fc:cc:cd(kube6)] using sleeve
INFO: 2019/07/17 08:00:54.782578 overlay_switch ->[22:49:f6:fc:cc:cd(kube6)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:54.782607 ->[10.0.1.6:54783|22:49:f6:fc:cc:cd(kube6)]: connection shutting down due to error: no working forwarders to 22:49:f6:fc:cc:cd(kube6)
INFO: 2019/07/17 08:00:54.782728 ->[10.0.1.6:54783|22:49:f6:fc:cc:cd(kube6)]: connection deleted
INFO: 2019/07/17 08:00:54.783201 ->[10.0.1.6:6783] attempting connection
INFO: 2019/07/17 08:00:54.783921 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: read tcp4 10.0.1.6:54783->10.0.1.4:6783: read: connection reset by peer
INFO: 2019/07/17 08:00:54.784291 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:54.791787 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:54.791867 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:54.802761 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:54.804503 ->[10.0.1.4:49943] connection accepted
INFO: 2019/07/17 08:00:54.806602 ->[10.0.1.4:49943|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:54.806766 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:54.806808 ->[10.0.1.4:49943|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:54.807706 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:54.807807 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:54.807843 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 22:49:f6:fc:cc:cd(kube6)
INFO: 2019/07/17 08:00:54.805306 ->[10.0.1.6:40613] connection accepted
INFO: 2019/07/17 08:00:54.807246 ->[10.0.1.6:40613|22:49:f6:fc:cc:cd(kube6)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:54.807432 overlay_switch ->[22:49:f6:fc:cc:cd(kube6)] using fastdp
INFO: 2019/07/17 08:00:54.807464 ->[10.0.1.6:6783|22:49:f6:fc:cc:cd(kube6)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:54.807607 overlay_switch ->[22:49:f6:fc:cc:cd(kube6)] using fastdp
INFO: 2019/07/17 08:00:54.807622 ->[10.0.1.6:40613|22:49:f6:fc:cc:cd(kube6)]: connection added (new peer)
INFO: 2019/07/17 08:00:54.807887 ->[10.0.1.6:40613|22:49:f6:fc:cc:cd(kube6)]: connection deleted
INFO: 2019/07/17 08:00:54.807976 ->[10.0.1.6:40613|22:49:f6:fc:cc:cd(kube6)]: connection shutting down due to error: Multiple connections to 22:49:f6:fc:cc:cd(kube6) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:54.808353 ->[10.0.1.6:6783|22:49:f6:fc:cc:cd(kube6)]: connection added (new peer)
INFO: 2019/07/17 08:00:54.812050 overlay_switch ->[22:49:f6:fc:cc:cd(kube6)] using sleeve
INFO: 2019/07/17 08:00:54.812164 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/17 08:00:54.812167 ->[10.0.1.6:6783|22:49:f6:fc:cc:cd(kube6)]: connection fully established
INFO: 2019/07/17 08:00:54.814045 sleeve ->[10.0.1.6:6783|22:49:f6:fc:cc:cd(kube6)]: Effective MTU verified at 1358
INFO: 2019/07/17 08:00:57.638657 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:57.643321 overlay_switch ->[ee:8c:04:f1:3d:5b(kube7)] using sleeve
INFO: 2019/07/17 08:00:57.643405 overlay_switch ->[ee:8c:04:f1:3d:5b(kube7)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:57.643446 ->[10.0.1.7:33687|ee:8c:04:f1:3d:5b(kube7)]: connection shutting down due to error: no working forwarders to ee:8c:04:f1:3d:5b(kube7)
INFO: 2019/07/17 08:00:57.643579 ->[10.0.1.7:33687|ee:8c:04:f1:3d:5b(kube7)]: connection deleted
INFO: 2019/07/17 08:00:57.639063 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:57.639107 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: no working forwarders to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:57.639170 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:57.639650 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:57.640546 ->[10.0.1.4:43675] connection accepted
INFO: 2019/07/17 08:00:57.641988 ->[10.0.1.4:43675|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:57.642145 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:57.642243 ->[10.0.1.4:43675|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:57.642908 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:57.642986 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:57.643014 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to ee:8c:04:f1:3d:5b(kube7)
INFO: 2019/07/17 08:00:57.644105 ->[10.0.1.7:6783] attempting connection
INFO: 2019/07/17 08:00:57.645832 ->[10.0.1.7:57153] connection accepted
INFO: 2019/07/17 08:00:57.647364 ->[10.0.1.7:6783|ee:8c:04:f1:3d:5b(kube7)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:57.647461 overlay_switch ->[ee:8c:04:f1:3d:5b(kube7)] using fastdp
INFO: 2019/07/17 08:00:57.647494 ->[10.0.1.7:6783|ee:8c:04:f1:3d:5b(kube7)]: connection added (new peer)
INFO: 2019/07/17 08:00:57.648990 overlay_switch ->[ee:8c:04:f1:3d:5b(kube7)] using sleeve
INFO: 2019/07/17 08:00:57.649055 ->[10.0.1.7:6783|ee:8c:04:f1:3d:5b(kube7)]: connection fully established
INFO: 2019/07/17 08:00:57.649082 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/17 08:00:57.649537 ->[10.0.1.7:57153|ee:8c:04:f1:3d:5b(kube7)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:57.649621 overlay_switch ->[ee:8c:04:f1:3d:5b(kube7)] using fastdp
INFO: 2019/07/17 08:00:57.649653 ->[10.0.1.7:57153|ee:8c:04:f1:3d:5b(kube7)]: connection shutting down due to error: Multiple connections to ee:8c:04:f1:3d:5b(kube7) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:57.652522 sleeve ->[10.0.1.7:6783|ee:8c:04:f1:3d:5b(kube7)]: Effective MTU verified at 1358
INFO: 2019/07/17 08:00:58.326640 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using sleeve
INFO: 2019/07/17 08:00:58.328993 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/07/17 08:00:58.329058 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection shutting down due to error: no working forwarders to 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:58.329164 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection deleted
INFO: 2019/07/17 08:00:58.329509 ->[10.0.1.5:6783] attempting connection
INFO: 2019/07/17 08:00:58.327961 ->[10.0.1.4:54443|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: read tcp4 10.0.1.5:6783->10.0.1.4:54443: read: connection reset by peer
INFO: 2019/07/17 08:00:58.328079 ->[10.0.1.4:54443|a6:a3:89:97:87:9b(kube4)]: connection deleted
INFO: 2019/07/17 08:00:58.328594 ->[10.0.1.4:6783] attempting connection
INFO: 2019/07/17 08:00:58.330977 ->[10.0.1.4:58473] connection accepted
INFO: 2019/07/17 08:00:58.336319 ->[10.0.1.5:55475] connection accepted
INFO: 2019/07/17 08:00:58.334913 ->[10.0.1.4:58473|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:58.335043 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:58.335073 ->[10.0.1.4:58473|a6:a3:89:97:87:9b(kube4)]: connection added (new peer)
INFO: 2019/07/17 08:00:58.336778 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:58.336868 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using fastdp
INFO: 2019/07/17 08:00:58.336895 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection added (new peer)
INFO: 2019/07/17 08:00:58.337942 ->[10.0.1.5:55475|02:f0:18:50:dd:f7(kube5)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:58.338056 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using fastdp
INFO: 2019/07/17 08:00:58.336858 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection ready; using protocol version 2
INFO: 2019/07/17 08:00:58.337021 overlay_switch ->[a6:a3:89:97:87:9b(kube4)] using fastdp
INFO: 2019/07/17 08:00:58.337104 ->[10.0.1.4:6783|a6:a3:89:97:87:9b(kube4)]: connection shutting down due to error: Multiple connections to a6:a3:89:97:87:9b(kube4) added to 02:f0:18:50:dd:f7(kube5)
INFO: 2019/07/17 08:00:58.338099 ->[10.0.1.5:55475|02:f0:18:50:dd:f7(kube5)]: connection shutting down due to error: Multiple connections to 02:f0:18:50:dd:f7(kube5) added to a6:a3:89:97:87:9b(kube4)
INFO: 2019/07/17 08:00:58.340614 overlay_switch ->[02:f0:18:50:dd:f7(kube5)] using sleeve
INFO: 2019/07/17 08:00:58.341034 EMSGSIZE on send, expecting PMTU update (IP packet was 60028 bytes, payload was 60020 bytes)
INFO: 2019/07/17 08:00:58.341043 ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: connection fully established
INFO: 2019/07/17 08:00:58.342250 sleeve ->[10.0.1.5:6783|02:f0:18:50:dd:f7(kube5)]: Effective MTU verified at 1358
INFO: 2019/07/17 08:01:34.828438 Sending ICMP 3,4 (10.42.0.28 -> 10.47.0.13): PMTU=1358
INFO: 2019/07/17 08:01:34.828624 Sending ICMP 3,4 (10.42.0.28 -> 10.47.0.13): PMTU=1358
After last issue reported, I've enabled DEBUG logging, and it happened again. Pls check attached log. weawe_error.log
And as usual, issue fixed after restart of all weave pods.
Same issue on K8s 1.16 and CoreOS stable (2303.3.0).
It correlated exactly with memory pressure caused by a pod running on the affected nodes. Weave started logging what others have posted here, when the node hit around 150MB-0MB free memory reported by the Prometheus node-exporter. So far, I could correlate 3 incidents with memory pressure.
Is anyone else seeing this correlation? EDIT: Seems to be related to #3695 in my case.
This is the same as my issue I think
https://github.com/weaveworks/weave/issues/3757
Any help is appreciated.
The original trail is long-dead and it's too confusing to respond to multiple threads of conversation in a GitHub issue; please open a new issue rather than commenting on this. The new issue template will request info that is essential to debug.
Note that "multiple connections" is not really a problem; just a transient condition that gets reported in passing. People commenting here were having other issues.
I faced the same issue, If you see the message: IP allocation was seeded by different peers, this means that some Weave Net peers were initialized into one cluster and some into another cluster; Weave Net cannot operate in this state. To recover, you need to eliminate the IPAM data from the affected nodes and restart. If you installed via the Kubernetes Addon, this data will be in a file under /var/lib/weave on the node , this solution worked for me
What you expected to happen?
Inter-node cluster-internal traffic to work
What happened?
At random times one nodes' pod network becomes unreachable/can't connect to other nodes' pod-network. Nodes internal traffic still works
Deleting the pod fixes the issue temporarily
Anything else we need to know?
Baremetal deployment with 3 nodes (1 master, 2 workers), metallb in L2 mode and
WEAVE_MTU
set to1500
andNO_MASQ_LOCAL
set to1
Versions:
Logs:
Error occurred around 08:36 server time
weave status in working state:
I will try to get the weave outputs during failure state, but didn't have the weave script installed at the time/looked up weave troubleshooting and had to get the issue fixed asap.