rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

fresh install with v0.20.6-k3s1r0 amd64 multiple issues (DNS, Gateway/routing) #703

Open a10sbraun opened 3 years ago

a10sbraun commented 3 years ago

Version (k3OS / kernel) 0.20.6-k3s1r0 5.4.0-72.80-rancher1

Architecture x86_64

Describe the bug Where to start, where to stop... My DNS server are not added to the resolve.conf, so if the default gateway would work, I still could not reach something... Default gateway does not work, ip route does not know the route... I also have to route add, change the default dev and so on... Default gw of the config not there...

k3os-master [~]$ ip route
default dev eth2 scope link 
10.0.1.0/24 dev eth2 proto kernel scope link src 10.0.1.10 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.10 
127.0.0.0/8 dev lo scope host 
192.168.108.0/24 dev eth1 proto kernel scope link src 192.168.108.40 
192.168.108.1 dev eth1 scope link 
cat /etc/resolv.conf 
empty no DNS server

Also DHCP is not working in the boot image so I have to do all the network config to be able to install the system...

As workaround I added

- content: |-
    nameserver 9.9.9.9
  path: /etc/resolv.conf
run_cmd:
- "/usr/sbin/route add default gw 192.168.108.1"

fun fact, it does not always work on all VMs... to get it 100% working I have to do the following: k3os cfg --dump > /var/lib/rancher/k3os/config.yaml

To Reproduce Install K3OS

Expected behavior The fundamental basic things (connectivity) works.

Actual behavior No DNS, no routing...

Additional context My YAML to install the Master

write_files:
- path: /var/lib/rancher/k3os/ssh/sshd_config
  content: |-
    # See https://man.openbsd.org/sshd_config for
    # details on these and other parameters.

    AllowTcpForwarding      no
    GatewayPorts            no
    PasswordAuthentication  yes
    X11Forwarding           no
    PermitRootLogin         no
    LoginGraceTime          30s
    MaxAuthTries            5

    Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

    Subsystem   sftp    internal-sftp
- path: /etc/hosts
  content: |-
    127.0.0.1       localhost localhost.localdomain
    127.0.1.1 k3os-master
    10.10.10.10 k3os-master
    10.10.10.20 k3os-node1
    10.10.10.30 k3os-node2

    ::1     ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
- path: /var/lib/connman/default.config
  content: |-
    [service_eth0]
    Type=ethernet
    IPv4=10.10.10.10/255.255.255.0
    IPv6=off
    MAC=52:54:00:71:ce:88
    [service_eth1]
    Type=ethernet
    IPv4=192.168.108.40/255.255.255.0/192.168.108.1
    IPv6=off
    MAC=52:54:00:9a:cb:6f
    [service_eth2]
    Type=ethernet
    IPv4=10.0.1.10/255.255.255.0
    IPv6=off
    MAC=52:54:00:c7:ad:de
hostname: k3os-master
k3os:
  password: "<password>"
  token: "<token>"
  dns_nameservers:
  - 8.8.8.8
  - 1.1.1.1
  ntp_servers:
  - 0.de.pool.ntp.org
  - 1.de.pool.ntp.org
  k3s_args:
    - server
    - "--node-name=k3os-master"
    - "--bind-address=10.10.10.10"
    - "--advertise-address=10.10.10.10"
    - "--flannel-backend=ipsec"
    - "--flannel-iface=eth0"
    - "--node-ip=10.10.10.10"
    - "--node-external-ip=10.0.1.10"

Config dump after install:

k3os-master [~]$ sudo k3os cfg --dump
hostname: k3os-master
k3os:
  dns_nameservers:
  - 8.8.8.8
  - 1.1.1.1
  k3s_args:
  - server
  - --node-name=k3os-master
  - --bind-address=10.10.10.10
  - --advertise-address=10.10.10.10
  - --flannel-backend=ipsec
  - --flannel-iface=eth0
  - --node-ip=10.10.10.10
  - --node-external-ip=10.0.1.10
  ntp_servers:
  - 0.de.pool.ntp.org
  - 1.de.pool.ntp.org
  password: <password>
  token: <token>
write_files:
- content: |-
    # See https://man.openbsd.org/sshd_config for
    # details on these and other parameters.

    AllowTcpForwarding      no
    GatewayPorts            no
    PasswordAuthentication  yes
    X11Forwarding           no
    PermitRootLogin         no
    LoginGraceTime          30s
    MaxAuthTries            5

    Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

    Subsystem   sftp    internal-sftp
  encoding: ""
  owner: ""
  path: /var/lib/rancher/k3os/ssh/sshd_config
  permissions: ""
- content: |-
    127.0.0.1       localhost localhost.localdomain
    127.0.1.1 k3os-master
    10.10.10.10 k3os-master
    10.10.10.20 k3os-node1
    10.10.10.30 k3os-node2

    ::1     ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
  encoding: ""
  owner: ""
  path: /etc/hosts
  permissions: ""
- content: |-
    [service_eth0]
    Type=ethernet
    IPv4=10.10.10.10/255.255.255.0
    IPv6=off
    MAC=52:54:00:71:ce:88
    [service_eth1]
    Type=ethernet
    IPv4=192.168.108.40/255.255.255.0/192.168.108.1
    IPv6=off
    MAC=52:54:00:9a:cb:6f
    [service_eth2]
    Type=ethernet
    IPv4=10.0.1.10/255.255.255.0
    IPv6=off
    MAC=52:54:00:c7:ad:de
  encoding: ""
  owner: ""
  path: /var/lib/connman/default.config
  permissions: ""
k3os-master [~]$ 

connman settings:

k3os-master [~]# cat /var/lib/connman/ethernet_52540071ce88_cable/settings 
[ethernet_52540071ce88_cable]
Name=Wired
AutoConnect=true
Modified=2021-05-19T12:35:23Z
IPv4.method=fixed
IPv4.netmask_prefixlen=24
IPv4.local_address=10.10.10.10
IPv6.method=off
IPv6.privacy=disabled
Config.file=default
Config.ident=service_eth0
k3os-master [~]# cat /var/lib/connman/ethernet_5254009acb6f_cable/settings 
[ethernet_5254009acb6f_cable]
Name=Wired
AutoConnect=true
Modified=2021-05-19T12:35:23Z
IPv4.method=fixed
IPv4.netmask_prefixlen=24
IPv4.local_address=192.168.108.40
IPv4.gateway=192.168.108.1
IPv6.method=off
IPv6.privacy=disabled
Config.file=default
Config.ident=service_eth1
k3os-master [~]# cat /var/lib/connman/ethernet_525400c7adde_cable/settings 
[ethernet_525400c7adde_cable]
Name=Wired
AutoConnect=true
Modified=2021-05-19T12:35:23Z
IPv4.method=fixed
IPv4.netmask_prefixlen=24
IPv4.local_address=10.0.1.10
IPv6.method=off
IPv6.privacy=disabled
Config.file=default
Config.ident=service_eth2
k3os-master [~]# cat /var/lib/connman/default.config 
[service_eth0]
Type=ethernet
IPv4=10.10.10.10/255.255.255.0
IPv6=off
MAC=52:54:00:71:ce:88
[service_eth1]
Type=ethernet
IPv4=192.168.108.40/255.255.255.0/192.168.108.1
IPv6=off
MAC=52:54:00:9a:cb:6f
[service_eth2]
Type=ethernet
IPv4=10.0.1.10/255.255.255.0
IPv6=off
MAC=52:54:00:c7:ad:de

Fix of the gateway:

k3os-master [~]# route add default gw 192.168.108.1
k3os-master [~]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=118 time=14.386 ms
64 bytes from 8.8.8.8: seq=1 ttl=118 time=13.954 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 13.954/14.170/14.386 ms
k3os-master [~]# ip route
default via 192.168.108.1 dev eth1 
default dev eth2 scope link 
10.0.1.0/24 dev eth2 proto kernel scope link src 10.0.1.10 
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.10 
127.0.0.0/8 dev lo scope host 
192.168.108.0/24 dev eth1 proto kernel scope link src 192.168.108.40 
192.168.108.1 dev eth1 scope link 

to fix DNS:

k3os-master [~]# ping google.de
^C
k3os-master [~]# vim /etc/resolv.conf 
k3os-master [~]# ping google.de
PING google.de (142.250.185.195): 56 data bytes
64 bytes from 142.250.185.195: seq=0 ttl=118 time=13.686 ms
^C
--- google.de ping statistics ---
2 packets transmitted, 1 packets received, 50% packet loss
round-trip min/avg/max = 13.686/13.686/13.686 ms
k3os-master [~]# cat /etc/resolv.conf 
nameserver 8.8.8.8
dweomer commented 3 years ago

This all sounds like a DHCP issue to me. Is it functional on your network/subnet?

a10sbraun commented 3 years ago

DHCP is working well, and it works and worked perfectly fine with the older ISO image.

The bigger and more annoying issue is that the static configuration is not reliable via cloud-init and just works 100% fine once I dump 100% the same config into /var/lib/rancher/k3os/config.yaml. Which means somehow it process the config it has not correctly until it is also there as config.yaml which is very strange for me...

mstarostik commented 3 years ago

@a10sbraun connman configures this: default dev eth2 scope link which is basically what I get as well when adding a 2nd interface (eth1 here). In my case this prevents k3OS from downloading the userdata (hetzner cloud). When I detach the internal network from the server and reboot, it gets the correct default route on the public interface (eth0) and the userdata download works.

Currently looking for a way to make connman not add a default route to the eth1. Instead, I'd like to add static (non default) routes to eth1, trying to do so run_cmd, but I'm not sure what happens when connman will renew the lease.

As the internal IP is known, I'm considering now to just inject this into the userdata, doing static config with just iproute2 and use connman only for the public interface.

I really dig k3OS, but connman is the one piece that keeps annoying me in most setups. Anything that's not totally trivial (like...a 2nd netdev oh my) and this thing just falls apart.

mstarostik commented 3 years ago

Alright, after some more digging into this and experimenting with various network setups, I guess I made my peace with connman. Just to make this clear: for those cases it's intended for, it does a good job. Problems only arise when you need more control.

Just FYI and to share for everyone who may end up here looking for information about routing setup or whatever else with multiple interfaces etc. Until now I was only aware of options to keep connman from touching secondary interfaces and then inlude some *_cmd:s with iproute2 etc. to manually setup. TIL busybox as included in k3OS does come with support for ifupdown so you can also do things like this:

boot_cmd:
- |
  echo 'auto eth0
  iface eth0 inet dhcp
  auto eth1
  iface eth1 inet dhcp' > /etc/network/interfaces
- rc-update del connman boot
- rc-update add networking boot
- rc-update add ntpd default

The example doesn't do much, but it already gets you a defined and well-known routing priority as eth0 has a predictably lower metric than eth1. You can include scripts as well. (and use write_files, but owed to my particular setup this was not an option). Maybe this was all too obvious, at least to me the presence of ifupdown was a great realization that there actually is something between connman and completely manual setup.

DuncanvR commented 3 years ago

I solved the issue of connman picking the wrong interface for the default route by adapting this comment and creating a script that forces the correct order on start-up:

run_cmd:
  - /etc/scripts/fix-connman-service-order.sh

write_files:
  - path: /etc/scripts/fix-connman-service-order.sh
    # With Hetzner servers, eth0 is the main interface connected to the internet; eth1 is connected to the virtual network
    # if connman does not consider eth0 to come before eth1, the generated route table will use the virtual network as default route, effectively disabling all public in- & outbound connections
    content: |
      #!/bin/bash
      _log() {
          echo "$1"
          logger -t 'fix-connman-service-order' "$1"
      }
      _eth0=$(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth0' && echo "$_s"; done)
      _eth1=$(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth1' && echo "$_s"; done)
      _log "Found connman service for eth0: $_eth0"
      _log "Found connman service for eth1: $_eth1"
      connmanctl move-before "$_eth0" "$_eth1"
      _log "Done"
    owner: 'root:root'
    permissions: '0755'

Edit: while this script does re-enable connections over the primary interface, the routing table is left with no entries for the secondary interface. So now I can't connect to anything on my virtual network. I'm still trying to define additional routes from the script, but no luck so far. :disappointed:

mysticaltech commented 2 years ago

@mstarostik I have been banging my head on my desk for the last 48h! Your solution worked... Finally! Thank you so much for sharing!

Just to give other people searching for the solution some context. If you install k3os on Hetzner and you can't connect to the machine via ssh or even ping it, this is the reason why. Somehow the DHCP does not get performed correctly and the correct default route to the Hetzner gateway is not added to the default interface eth0. The solution below fixes this!

Alright, after some more digging into this and experimenting with various network setups, I guess I made my peace with connman. Just to make this clear: for those cases it's intended for, it does a good job. Problems only arise when you need more control.

Just FYI and to share for everyone who may end up here looking for information about routing setup or whatever else with multiple interfaces etc. Until now I was only aware of options to keep connman from touching secondary interfaces and then inlude some *_cmd:s with iproute2 etc. to manually setup. TIL busybox as included in k3OS does come with support for ifupdown so you can also do things like this:

boot_cmd:
- |
  echo 'auto eth0
  iface eth0 inet dhcp
  auto eth1
  iface eth1 inet dhcp' > /etc/network/interfaces
- rc-update del connman boot
- rc-update add networking boot
- rc-update add ntpd default

The example doesn't do much, but it already gets you a defined and well-known routing priority as eth0 has a predictably lower metric than eth1. You can include scripts as well. (and use write_files, but owed to my particular setup this was not an option). Maybe this was all too obvious, at least to me the presence of ifupdown was a great realization that there actually is something between connman and completely manual setup.

a10sbraun commented 2 years ago

I was able to fix my issues as well, by using the scripts in a different way... as I just work with static configurations, I did it now in the following way:

run_cmd:
- /usr/sbin/route add default gw 192.168.108.1
- /usr/bin/connmanctl disconnect $(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth0' && echo "$_s"; done)
- /usr/bin/connmanctl connect $(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth0' && echo "$_s"; done)
- /usr/bin/connmanctl disconnect $(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth2' && echo "$_s"; done)
- /usr/bin/connmanctl connect $(connmanctl services | awk '{ print $3 }' | while read -r _s; do connmanctl services "$_s" | grep -q 'eth2' && echo "$_s"; done)

This will not shutdown and enable the "non default gateway" interfaces, which keeps eht1 in my case operating normal and the routing is and stays fine.

I never faced in my life such crap as connman... it is really the worst experience ever!!! Please change it in a way that I can just use my super mega reliable ifcfg files under /etc/sysconfig/network-scripts/ or at least like Debian under /etc/network/interface.d/

connman is really unacceptable and unusable...