weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 671 forks source link

"dial tcp: i/o timeout" issues with FastDP and Kubernetes Services. #3605

Open HackToHell opened 5 years ago

HackToHell commented 5 years ago

What you expected to happen?

When a pod comes up it's able to hit any internal service consistently, kubernetes.default.svc for this issue.

What happened?

Getting error: couldn't get deployment devicebroker-12: Get https://172.30.0.1:443/api/v1/namespaces/v14-rapyuta-core/replicationcontrollers/devicebroker-12: dial tcp 172.30.0.1:443: i/o timeout

How to reproduce it?

Happens with openshift deployer pods that bring up a new pods when a deployment is rolled, little sporadic but can be easily reproduced.

Anything else we need to know?

Looks like it was an issue in Openshift SDN and was fixed by changing a flow. https://github.com/openshift/origin/issues/5796 The fix (?) https://github.com/openshift/openshift-sdn/pull/236/files Since weave also uses ovs, could it be related?

Network policies are enabled for some name spaces in the cluster.

Versions:

Openshift 3.9 running on Microsoft Azure.

Openshift version

oc v3.9.0+ba7faec-1 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO

Uname

Linux oc-master-0 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Weave

/home/weave # ./weave --local status

    Version: 2.5.1 (up to date; next check at 2019/02/28 18:27:07)

    Service: router
   Protocol: weave 1..2
       Name: 52:cd:da:04:c9:04(oc-master-0)
 Encryption: disabled

PeerDiscovery: enabled Targets: 9 Connections: 18 (17 established, 1 failed) Peers: 18 (with 306 established connections) TrustedSubnets: none

    Service: ipam
     Status: ready
      Range: 10.32.0.0/16

DefaultSubnet: 10.32.0.0/16

Docker

Client: Version: 1.13.1 API version: 1.26 Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64 Go version: go1.9.4 Git commit: 07f3374/1.13.1 Built: Fri Dec 7 16:13:51 2018 OS/Arch: linux/amd64

Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64 Go version: go1.9.4 Git commit: 07f3374/1.13.1 Built: Fri Dec 7 16:13:51 2018 OS/Arch: linux/amd64 Experimental: false

Logs:

There's a lot of Vetoed installation of hairpin flow messages.

Complete logs for weave at https://gist.github.com/HackToHell/5ffc79ca73f0bbf83c7697857fb34395

Startup logs for kubelet at https://gist.github.com/HackToHell/97605144a51a5850c8828ef5f45cb745

Network:

Routes

default via 10.2.0.1 dev eth0 proto dhcp metric 100 10.2.0.0/16 dev eth0 proto kernel scope link src 10.2.0.8 metric 100 10.32.0.0/16 dev weave proto kernel scope link src 10.32.16.0 168.63.129.16 via 10.2.0.1 dev eth0 proto dhcp metric 100 169.254.169.254 via 10.2.0.1 dev eth0 proto dhcp metric 100 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

Addrs

1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 2: eth0 inet 10.2.0.8/16 brd 10.2.255.255 scope global noprefixroute eth0\ valid_lft forever preferred_lft forever 3: docker0 inet 172.17.0.1/16 scope global docker0\ valid_lft forever preferred_lft forever 6: weave inet 10.32.16.0/16 brd 10.32.255.255 scope global weave\ valid_lft forever preferred_lft forever

IP Tables Save

https://gist.github.com/HackToHell/73a249b0ab9818703905d976d41ab262

Output of ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0d:3a:51:bf:8d brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.8/16 brd 10.2.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20d:3aff:fe51:bf8d/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:4b:f5:95:d8 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 9e:72:47:4d:ac:5e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c72:47ff:fe4d:ac5e/64 scope link 
       valid_lft forever preferred_lft forever
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 02:b4:b4:7a:22:00 brd ff:ff:ff:ff:ff:ff
    inet 10.32.16.0/16 brd 10.32.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::b4:b4ff:fe7a:2200/64 scope link 
       valid_lft forever preferred_lft forever
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7e:22:fb:01:7d:d0 brd ff:ff:ff:ff:ff:ff
9: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default 
    link/ether 7e:3c:f1:2a:96:7d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7c3c:f1ff:fe2a:967d/64 scope link 
       valid_lft forever preferred_lft forever
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5e:f6:e2:48:73:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5cf6:e2ff:fe48:73b7/64 scope link 
       valid_lft forever preferred_lft forever
11: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
    link/ether ee:01:91:0a:21:17 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ec01:91ff:fe0a:2117/64 scope link 
       valid_lft forever preferred_lft forever
13: vethwepl9b236b6@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 26:29:6d:5d:db:ca brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::2429:6dff:fe5d:dbca/64 scope link 
       valid_lft forever preferred_lft forever
19: vethwepl411c7cf@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether ba:6e:f9:08:01:34 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::b86e:f9ff:fe08:134/64 scope link 
       valid_lft forever preferred_lft forever
21: vethweplac55f1b@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 8e:5f:09:d8:4f:9d brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::8c5f:9ff:fed8:4f9d/64 scope link 
       valid_lft forever preferred_lft forever
793: vethwepl7ad654f@if792: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 56:d4:cd:ef:c9:13 brd ff:ff:ff:ff:ff:ff link-netnsid 35
    inet6 fe80::54d4:cdff:feef:c913/64 scope link 
       valid_lft forever preferred_lft forever
801: vethweplc8d8a5f@if800: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5a:f7:4d:cd:d1:2e brd ff:ff:ff:ff:ff:ff link-netnsid 38
    inet6 fe80::58f7:4dff:fecd:d12e/64 scope link 
       valid_lft forever preferred_lft forever
33: vethwepl29b981c@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether f2:9d:d5:e1:ab:c7 brd ff:ff:ff:ff:ff:ff link-netnsid 6
    inet6 fe80::f09d:d5ff:fee1:abc7/64 scope link 
       valid_lft forever preferred_lft forever
803: vethweple401049@if802: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 96:cc:1a:35:26:cb brd ff:ff:ff:ff:ff:ff link-netnsid 39
    inet6 fe80::94cc:1aff:fe35:26cb/64 scope link 
       valid_lft forever preferred_lft forever
805: vethwepl59253a9@if804: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether da:e6:da:ac:69:5b brd ff:ff:ff:ff:ff:ff link-netnsid 41
    inet6 fe80::d8e6:daff:feac:695b/64 scope link 
       valid_lft forever preferred_lft forever
807: vethwepl6b5d5c0@if806: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether aa:2e:3b:90:e9:ce brd ff:ff:ff:ff:ff:ff link-netnsid 42
    inet6 fe80::a82e:3bff:fe90:e9ce/64 scope link 
       valid_lft forever preferred_lft forever
39: vethwepla63a82e@if38: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 7e:72:fa:54:0b:df brd ff:ff:ff:ff:ff:ff link-netnsid 7
    inet6 fe80::7c72:faff:fe54:bdf/64 scope link 
       valid_lft forever preferred_lft forever
809: vethweple879f6f@if808: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 0e:8f:2b:2f:86:42 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::c8f:2bff:fe2f:8642/64 scope link 
       valid_lft forever preferred_lft forever
319: vethwepl29f512b@if318: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 02:de:89:f1:2a:38 brd ff:ff:ff:ff:ff:ff link-netnsid 11
    inet6 fe80::de:89ff:fef1:2a38/64 scope link 
       valid_lft forever preferred_lft forever
63: vethwepl3005a70@if62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether ce:8d:70:e5:73:7c brd ff:ff:ff:ff:ff:ff link-netnsid 9
    inet6 fe80::cc8d:70ff:fee5:737c/64 scope link 
       valid_lft forever preferred_lft forever
67: vethwepla0ca9b4@if66: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 1e:bb:d1:5c:13:55 brd ff:ff:ff:ff:ff:ff link-netnsid 8
    inet6 fe80::1cbb:d1ff:fe5c:1355/64 scope link 
       valid_lft forever preferred_lft forever
583: vethweplb9f2578@if582: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 9a:f7:31:3e:e2:91 brd ff:ff:ff:ff:ff:ff link-netnsid 40
    inet6 fe80::98f7:31ff:fe3e:e291/64 scope link 
       valid_lft forever preferred_lft forever
597: vethweplb770871@if596: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether ea:df:1e:24:2d:9c brd ff:ff:ff:ff:ff:ff link-netnsid 25
    inet6 fe80::e8df:1eff:fe24:2d9c/64 scope link 
       valid_lft forever preferred_lft forever
613: vethweplf66e001@if612: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 0e:2e:71:21:14:56 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    inet6 fe80::c2e:71ff:fe21:1456/64 scope link 
       valid_lft forever preferred_lft forever
615: vethweplf825a9c@if614: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5a:08:47:0e:3c:7e brd ff:ff:ff:ff:ff:ff link-netnsid 23
    inet6 fe80::5808:47ff:fe0e:3c7e/64 scope link 
       valid_lft forever preferred_lft forever
653: vethwepl6e43f09@if652: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 9e:8f:db:c5:fc:f2 brd ff:ff:ff:ff:ff:ff link-netnsid 16
    inet6 fe80::9c8f:dbff:fec5:fcf2/64 scope link 
       valid_lft forever preferred_lft forever
397: vethwepl6ddfe2d@if396: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 2a:73:00:cf:0c:12 brd ff:ff:ff:ff:ff:ff link-netnsid 20
    inet6 fe80::2873:ff:fecf:c12/64 scope link 
       valid_lft forever preferred_lft forever
399: vethwepl500477b@if398: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5a:ab:4e:d1:91:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 21
    inet6 fe80::58ab:4eff:fed1:91a4/64 scope link 
       valid_lft forever preferred_lft forever
401: vethwepl2ad9217@if400: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether aa:90:1e:9e:5f:3a brd ff:ff:ff:ff:ff:ff link-netnsid 22
    inet6 fe80::a890:1eff:fe9e:5f3a/64 scope link 
       valid_lft forever preferred_lft forever
403: vethwepl7454f0c@if402: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 0e:0c:5f:4a:35:c4 brd ff:ff:ff:ff:ff:ff link-netnsid 10
    inet6 fe80::c0c:5fff:fe4a:35c4/64 scope link 
       valid_lft forever preferred_lft forever
405: vethwepl47c69fd@if404: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether be:8a:12:f4:3d:d9 brd ff:ff:ff:ff:ff:ff link-netnsid 13
    inet6 fe80::bc8a:12ff:fef4:3dd9/64 scope link 
       valid_lft forever preferred_lft forever
407: vethweplaecd2a1@if406: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether d6:80:63:f2:09:ff brd ff:ff:ff:ff:ff:ff link-netnsid 15
    inet6 fe80::d480:63ff:fef2:9ff/64 scope link 
       valid_lft forever preferred_lft forever
409: vethwepl404e076@if408: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 86:81:5b:a9:97:14 brd ff:ff:ff:ff:ff:ff link-netnsid 18
    inet6 fe80::8481:5bff:fea9:9714/64 scope link 
       valid_lft forever preferred_lft forever
413: vethwepl800aff5@if412: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 3e:50:23:71:8f:c4 brd ff:ff:ff:ff:ff:ff link-netnsid 12
    inet6 fe80::3c50:23ff:fe71:8fc4/64 scope link 
       valid_lft forever preferred_lft forever
671: vethwepl56ff739@if670: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 66:4a:cd:2c:ea:9f brd ff:ff:ff:ff:ff:ff link-netnsid 29
    inet6 fe80::644a:cdff:fe2c:ea9f/64 scope link 
       valid_lft forever preferred_lft forever
415: vethwepl7ec21db@if414: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 4e:fb:0b:d8:df:2b brd ff:ff:ff:ff:ff:ff link-netnsid 14
    inet6 fe80::4cfb:bff:fed8:df2b/64 scope link 
       valid_lft forever preferred_lft forever
675: vethwepld7a5f6c@if674: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether b2:db:c8:77:f7:0f brd ff:ff:ff:ff:ff:ff link-netnsid 31
    inet6 fe80::b0db:c8ff:fe77:f70f/64 scope link 
       valid_lft forever preferred_lft forever
421: vethwepl24f363d@if420: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 5e:6c:76:a2:07:6a brd ff:ff:ff:ff:ff:ff link-netnsid 19
    inet6 fe80::5c6c:76ff:fea2:76a/64 scope link 
       valid_lft forever preferred_lft forever
435: vethweplaa8ac0a@if434: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether a6:a3:8c:25:1a:d9 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::a4a3:8cff:fe25:1ad9/64 scope link 
       valid_lft forever preferred_lft forever
727: vethwepl11c1a72@if726: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 0a:92:2d:0b:ab:e6 brd ff:ff:ff:ff:ff:ff link-netnsid 24
    inet6 fe80::892:2dff:fe0b:abe6/64 scope link 
       valid_lft forever preferred_lft forever
737: vethwepl4088937@if736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 72:b7:fe:31:08:bc brd ff:ff:ff:ff:ff:ff link-netnsid 17
    inet6 fe80::70b7:feff:fe31:8bc/64 scope link 
       valid_lft forever preferred_lft forever
739: vethweplb6344dc@if738: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 62:ce:2f:ba:c7:9f brd ff:ff:ff:ff:ff:ff link-netnsid 26
    inet6 fe80::60ce:2fff:feba:c79f/64 scope link 
       valid_lft forever preferred_lft forever
741: vethwepl39e5939@if740: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 22:12:c9:42:8f:7b brd ff:ff:ff:ff:ff:ff link-netnsid 27
    inet6 fe80::2012:c9ff:fe42:8f7b/64 scope link 
       valid_lft forever preferred_lft forever
HackToHell commented 5 years ago

Was able to isolate logs better, getting

WARN: 2019/02/28 16:09:51.536184 Vetoed installation of hairpin flow FlowSpec{keys: [InPortFlowKey{vport: 1} EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d}], actions: [OutputAction{vport: 1}]}
WARN: 2019/02/28 16:09:52.536743 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
WARN: 2019/02/28 16:09:54.540654 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
WARN: 2019/02/28 16:09:58.548697 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
WARN: 2019/02/28 16:10:06.570492 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
WARN: 2019/02/28 16:10:08.568584 Vetoed installation of hairpin flow FlowSpec{keys: [EthernetFlowKey{src: 12:cc:f5:35:61:28, dst: 66:2f:0f:7c:15:6d} InPortFlowKey{vport: 1}], actions: [OutputAction{vport: 1}]}
IN

12:cc:f5:35:61:28 is the weave bridge IP.

6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 12:cc:f5:35:61:28 brd ff:ff:ff:ff:ff:ff
    inet 10.32.200.0/16 brd 10.32.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::10cc:f5ff:fe35:6128/64 scope link 
       valid_lft forever preferred_lft forever
bboreham commented 5 years ago

Looks like it was an issue in Openshift SDN

OpenShift and Weave Net are implemented in completely different ways; there is no reason to suppose that a similar symptom in one would be related to another.

Vetoed installation of hairpin flow

This can happen in harmful and non-harmful cases - see #2808

From the Weave logs, this is a sign of something definitely wrong:

INFO: 2019/03/01 05:30:40.299239 overlay_switch ->[12:e6:b4:29:a8:6e(ocg2vfett)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.299244 overlay_switch ->[32:2b:7e:8b:3d:ed(ocg2bupgd)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.299474 overlay_switch ->[c6:29:85:f4:63:a9(ocbuild-g2smppi)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.302852 overlay_switch ->[42:77:f6:7f:39:1c(ocg2ogfsz)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.307102 overlay_switch ->[be:e2:0b:28:c4:be(oc-infra-1)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.310343 overlay_switch ->[12:cc:f5:35:61:28(ocg2xpkdi)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.314442 overlay_switch ->[7e:ec:c5:cd:ee:bc(oc-infra-0)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.317610 overlay_switch ->[76:e3:6d:b1:9b:0a(ocbuild-g2ijddl)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.317910 overlay_switch ->[42:81:a1:34:2e:da(oc-master-1)] sleeve timed out waiting for UDP heartbeat
INFO: 2019/03/01 05:30:40.318902 overlay_switch ->[b2:2e:c7:e3:fe:80(oc-node-2)] sleeve timed out waiting for UDP heartbeat

this means that several heartbeats were missed. Yet those nodes continue to appear in later log messages.

These connections seem to be working better: (including some overlap with the above set)

INFO: 2019/03/01 05:29:40.351383 sleeve ->[10.2.0.16:6783|82:b8:f1:6f:61:e6(ocg2kgxea)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:29:40.353232 sleeve ->[10.2.0.7:6783|92:ba:d1:8d:5f:b4(ocg2bbyqo)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:29:40.800847 sleeve ->[10.1.0.5:6783|96:d2:39:f2:ee:dd(oc-infra-2)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:29:40.807056 sleeve ->[10.1.0.9:6783|1a:a4:db:93:9b:fb(oc-master-2)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:29:40.811124 sleeve ->[10.2.0.6:6783|3a:0e:f0:a3:94:35(ocg2cshhl)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:29:40.812061 sleeve ->[10.1.0.8:6783|52:cd:da:04:c9:04(oc-master-0)]: Effective MTU verified at 1438
INFO: 2019/03/01 05:30:40.819752 sleeve ->[10.1.0.7:6783|be:e2:0b:28:c4:be(oc-infra-1)]: Effective MTU verified at 1438

Is there any correlation between pods that work and don't work, and those two sets of nodes above?

I understand from Slack your symptoms are intermittent, so it would be useful to know any correlation between when the problem hit and the time in the logs.