kontena / pharos-cluster

Pharos - The Kubernetes Distribution
https://k8spharos.dev/
Apache License 2.0
311 stars 43 forks source link

pharos-cluster reset leaves networking routes/rules in place #469

Open SpComb opened 6 years ago

SpComb commented 6 years ago

After a pharos-cluster reset for a calico cluster, the hosts will still have all of their calico IPIP interfaces, routes and iptables rules in place.

root@terom-pharos-worker1:~# ip addr show tunl0
5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.32.2.1/32 brd 10.32.2.1 scope global tunl0
       valid_lft forever preferred_lft forever
root@terom-pharos-worker1:~# ip route
default via 188.166.64.1 dev eth0 onlink 
10.18.0.0/16 dev eth0  proto kernel  scope link  src 10.18.0.11 
10.32.0.0/24 via 167.99.39.233 dev tunl0  proto bird onlink 
10.32.1.0/24 via 206.189.0.173 dev tunl0  proto bird onlink 
blackhole 10.32.2.0/24  proto bird 
10.32.3.0/24 via 167.99.139.236 dev tunl0  proto bird onlink 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
188.166.64.0/18 dev eth0  proto kernel  scope link  src 188.166.118.151 
root@terom-pharos-worker1:~# sudo iptables -nvL
Chain INPUT (policy ACCEPT 508 packets, 364K bytes)
 pkts bytes target     prot opt in     out     source               destination         
  11M 6816M cali-INPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:Cz_u1IQiXIMmKD4c */
 235K   14M KUBE-EXTERNAL-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
9724K 6690M KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
3546K 3420M cali-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:wUHhoiAYhphO9Mso */
    2   120 KUBE-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */

Chain OUTPUT (policy ACCEPT 590 packets, 114K bytes)
 pkts bytes target     prot opt in     out     source               destination         
  10M 2207M cali-OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:tVnHkvAo15HuiPy0 */
 301K   18M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
  10M 2203M KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

...
SpComb commented 6 years ago

These will cause issues if trying to use pharos-cluster reset to switch network.provider without rebooting in between... for the calico -> weave case, the weave pods will be left in a crash loop:

root@terom-pharos-master:~# kubectl -n kube-system logs weave-net-9h7xc -c weave
Network 10.32.0.0/12 overlaps with existing route 10.32.0.0/24 on host

455 will catch this situation at pharos-cluster up time:

==> Validate hosts @ 167.99.39.233 188.166.118.151 206.189.0.173 167.99.139.236
    [167.99.39.233] Validating current role matches ...
    [206.189.0.173] Validating current role matches ...
    [206.189.0.173] Validating distro and version ...
    [206.189.0.173] Validating host configuration ...
    [167.99.39.233] Validating distro and version ...
    [206.189.0.173] Validating hostname uniqueness ...
    [206.189.0.173] Validating host routes ...
 [Validate hosts @ 206.189.0.173] RuntimeError: Overlapping host routes for .network.pod_network_cidr=10.32.0.0/12: 10.32.0.0/24 via 167.99.39.233 dev tunl0 proto bird onlink; blackhole 10.32.1.0/24 proto bird; 10.32.2.0/24 via 188.166.118.151 dev tunl0 proto bird onlink; 10.32.3.0/24 via 167.99.139.236 dev tunl0 proto bird onlink    [167.99.39.233] Validating host configuration ...
    [167.99.39.233] Validating hostname uniqueness ...
    [167.99.39.233] Validating host routes ...
 [Validate hosts @ 167.99.39.233] RuntimeError: Overlapping host routes for .network.pod_network_cidr=10.32.0.0/12: blackhole 10.32.0.0/24  proto bird; 10.32.1.0/24 via 206.189.0.173 dev tunl0  proto bird onlink; 10.32.2.0/24 via 188.166.118.151 dev tunl0  proto bird onlink; 10.32.3.0/24 via 167.99.139.236 dev tunl0  proto bird onlink

    [188.166.118.151] Validating current role matches ...
    [167.99.139.236] Validating current role matches ...
    [167.99.139.236] Validating distro and version ...
    [167.99.139.236] Validating host configuration ...
    [167.99.139.236] Validating hostname uniqueness ...
    [188.166.118.151] Validating distro and version ...
    [188.166.118.151] Validating host configuration ...
    [188.166.118.151] Validating hostname uniqueness ...
    [188.166.118.151] Validating host routes ...
    [167.99.139.236] Validating host routes ...
 [Validate hosts @ 167.99.139.236] RuntimeError: Overlapping host routes for .network.pod_network_cidr=10.32.0.0/12: 10.32.0.0/24 via 167.99.39.233 dev tunl0 proto bird onlink; 10.32.1.0/24 via 206.189.0.173 dev tunl0 proto bird onlink; 10.32.2.0/24 via 188.166.118.151 dev tunl0 proto bird onlink; blackhole 10.32.3.0/24 proto bird
 [Validate hosts @ 188.166.118.151] RuntimeError: Overlapping host routes for .network.pod_network_cidr=10.32.0.0/12: 10.32.0.0/24 via 167.99.39.233 dev tunl0  proto bird onlink; 10.32.1.0/24 via 206.189.0.173 dev tunl0  proto bird onlink; blackhole 10.32.2.0/24  proto bird; 10.32.3.0/24 via 167.99.139.236 dev tunl0  proto bird onlink
SpComb commented 6 years ago

Same thing with weave... and switching from weave -> calico without rebooting leaves you with an unholy combination of having both weave and calico interfaces/routes/rules simultaneously, because calico doesn't validate overlapping routes...

root@terom-pharos-master:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 76:d4:58:03:5d:f6 brd ff:ff:ff:ff:ff:ff
    inet 167.99.39.233/20 brd 167.99.47.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.18.0.13/16 brd 10.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::74d4:58ff:fe03:5df6/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:63:2d:50:5e brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UNKNOWN group default qlen 1
    link/ether 9e:9d:7e:45:11:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c9d:7eff:fe45:116b/64 scope link 
       valid_lft forever preferred_lft forever
6: weave: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default qlen 1000
    link/ether 4a:56:90:63:60:f2 brd ff:ff:ff:ff:ff:ff
    inet 10.40.0.0/12 brd 10.47.255.255 scope global weave
       valid_lft forever preferred_lft forever
    inet6 fe80::4856:90ff:fe63:60f2/64 scope link 
       valid_lft forever preferred_lft forever
7: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 62:69:dd:2e:ab:10 brd ff:ff:ff:ff:ff:ff
9: vethwe-datapath@vethwe-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master datapath state UP group default 
    link/ether 6e:58:86:d8:ad:69 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6c58:86ff:fed8:ad69/64 scope link 
       valid_lft forever preferred_lft forever
10: vethwe-bridge@vethwe-datapath: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue master weave state UP group default 
    link/ether 32:f3:1d:fd:33:36 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::30f3:1dff:fefd:3336/64 scope link 
       valid_lft forever preferred_lft forever
11: vxlan-6784: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65485 qdisc noqueue master datapath state UNKNOWN group default qlen 1000
    link/ether 76:44:7f:07:d3:e2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7444:7fff:fe07:d3e2/64 scope link 
       valid_lft forever preferred_lft forever
root@terom-pharos-master:~# ip route
default via 167.99.32.1 dev eth0 onlink 
10.18.0.0/16 dev eth0  proto kernel  scope link  src 10.18.0.13 
10.32.0.0/12 dev weave  proto kernel  scope link  src 10.40.0.0 
167.99.32.0/20 dev eth0  proto kernel  scope link  src 167.99.39.233 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
root@terom-pharos-master:~# sudo iptables -nvL
Chain INPUT (policy ACCEPT 229 packets, 14275 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 3452  846K KUBE-EXTERNAL-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
 278K  135M KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
 252K  122M WEAVE-IPSEC-IN  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  826 1626K WEAVE-NPC  all  --  *      weave   0.0.0.0/0            0.0.0.0/0            /* NOTE: this must go before '-j KUBE-FORWARD' */
    0     0 NFLOG      all  --  *      weave   0.0.0.0/0            0.0.0.0/0            state NEW nflog-group 86
    0     0 DROP       all  --  *      weave   0.0.0.0/0            0.0.0.0/0           
  487  102K ACCEPT     all  --  weave  !weave  0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      weave   0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 KUBE-FORWARD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */

Chain OUTPUT (policy ACCEPT 303 packets, 49296 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 3795  873K KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes service portals */
 278K  121M KUBE-FIREWALL  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 DROP      !esp  --  *      *       0.0.0.0/0            0.0.0.0/0            policy match dir out pol none mark match 0x20000/0x20000

Chain KUBE-EXTERNAL-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x4000/0x4000
    0     0 ACCEPT     all  --  *      *       10.32.0.0/12         0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            10.32.0.0/12         /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain WEAVE-IPSEC-IN (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       udp  --  *      *       167.99.139.236       167.99.39.233        udp dpt:6784 mark match ! 0x20000/0x20000

Chain WEAVE-NPC (1 references)
 pkts bytes target     prot opt in     out     source               destination         
  641 1610K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            224.0.0.0/4         
  185 16446 WEAVE-NPC-DEFAULT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW
    0     0 WEAVE-NPC-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ! match-set weave-local-pods dst

Chain WEAVE-NPC-DEFAULT (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst /* DefaultAllow isolation for namespace: default */
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set weave-(OFih%RP%?c@0BPw;F;Cvf!oG dst /* DefaultAllow isolation for namespace: ingress-nginx */
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst /* DefaultAllow isolation for namespace: kube-public */
  185 16446 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst /* DefaultAllow isolation for namespace: kube-system */

Chain WEAVE-NPC-INGRESS (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Surprisingly, calico pod networking seems to somehow work in that situation.