k0sproject / k0sctl

A bootstrapping and management tool for k0s clusters.
Other
394 stars 77 forks source link

Node FQDN not part of default ClusterConfig.spec.api.sans #782

Closed pschichtel closed 2 weeks ago

pschichtel commented 3 weeks ago

Before creating an issue, make sure you've checked the following:

Platform

Linux 6.1.0-26-amd64 k0sproject/k0s#1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`
Total memory: 15.6 GiB (pass)
File system of /var/kubernetes/k0s: ext4 (pass)
Disk space available for /var/kubernetes/k0s: 35.5 GiB (pass)
Relative disk space available for /var/kubernetes/k0s: 73% (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-26-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I deployed a 3 node (all nodes are controller+worker) cluster with k0sctl. k0s kubeconfig produces a config that uses the node's IP, k0sctl kubeconfig produces a config that contains the fqdn of the host.

I have 5 other clusters where the configs produced by both k0s and k0sctl work fine, but the new cluster does not seem to include the fqdn in the sans and as such the config is unusable as it fails with:

Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.svc.cluster.local, localhost, not node1.example.org

My k0sctl config as reference:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: name
spec:
  hosts:
  - openSSH:
      address: node1.example.org
    role: controller+worker
    noTaints: true
    dataDir: /var/kubernetes/k0s
  - openSSH:
      address: node2.example.org
    role: controller+worker
    noTaints: true
    dataDir: /var/kubernetes/k0s
  - openSSH:
      address: node3.example.org
    role: controller+worker
    noTaints: true
    dataDir: /var/kubernetes/k0s
  k0s:
    version: v1.31.1+k0s.1
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: name
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        extensions:
          storage:
            create_default_storage_class: false
            type: external_storage
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          provider: calico
          calico:
            mode: ipip
            overlay: Always
          clusterDomain: cluster.local
          dualStack: {}
          kubeProxy:
            iptables:
              masqueradeAll: true
              minSyncPeriod: 0s
              syncPeriod: 0s
            ipvs:
              minSyncPeriod: 0s
              syncPeriod: 0s
              tcpFinTimeout: 0s
              tcpTimeout: 0s
              udpTimeout: 0s
            metricsBindAddress: 0.0.0.0:10249
            mode: iptables
          kuberouter:
            autoMTU: true
            hairpin: Enabled
            ipMasq: false
            mtu: 0
          podCIDR: 10.244.0.0/16
          serviceCIDR: 10.96.0.0/12
        scheduler: {}
        storage:
          type: etcd
        telemetry:
          enabled: true

The k0sctl version is 0.19.2.

Steps to reproduce

  1. Deploy a cluster
  2. Try accessing nodes via their fqdn

Expected behavior

Access via fqdn should work.

Actual behavior

Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster, kubernetes.svc.cluster.local, localhost, not node1.example.org

Screenshots and logs

No response

Additional context

Not sure if this is a k0sctl issue or a k0s issue, I went for k0s.

pschichtel commented 3 weeks ago

I just checked the certificates as delivered by the API server of the first node:

DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.svc.cluster.local, DNS:localhost, IP Address:127.0.0.1, IP Address:<controller1 ip>, IP Address:10.96.0.1

The other nodes look identical, except that they have their own IP in there.

In a good cluster it looks like this:

DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.svc.cluster.local, DNS:localhost, DNS:controller1.example.org, DNS:controller2.example.org, DNS:controller3.example.org, IP Address:127.0.0.1, IP Address:<controller1 ip>, IP Address:<controller2 ip>, IP Address:<controller3 ip>, IP Address:10.96.0.1
pschichtel commented 3 weeks ago

Ok the missing IP addresses were caused by not having NLLB enabled. After enabling that the certificate contains the IPs of all controllers, but still no fqdns.

twz123 commented 3 weeks ago

Adding the FQDNs manually to spec.api.sans is not an acceptable workaround for you?

Not sure what would make sense here from k0s/k0sctl's point of view. There's some rewriting involved, though.

pschichtel commented 3 weeks ago

My problem with that is, that I also need to make sure to include all other names/ips that k8s/k0s/etcd seem to expect (see https://github.com/k0sproject/k0s/issues/4493#issuecomment-2444919451). if there was something like "extraSans" that adds entries in addition to the automatically detected value, I would be more fine with that.

But completely ignoring the workaround, the burning question for me is: Why are the fqdns included in 5/6 clusters? I'd expect 0/6 oder 6/6. The machines are built from the same VM template, they all run in the same flat network with the same DNS servers. They are all installed using k0sctl.

jnummelin commented 3 weeks ago

if there was something like "extraSans" that adds entries in addition to the automatically detected value, I would be more fine with that.

That is how the sans field behaves in k0s internally: https://github.com/k0sproject/k0s/blob/main/pkg/apis/k0s/v1beta1/api.go#L110 So basically k0s adds all the detected addresses, cluster internal names and the given sans from config

Why are the fqdns included in 5/6 clusters? I'd expect 0/6 oder 6/6.

k0s itself does NOT even detect FQDNs unless told to do so. So I'd assume there's some config differences between clusters, the one that does not use FQDN and the ones that do.

pschichtel commented 3 weeks ago

k0s itself does NOT even detect FQDNs unless told to do so.

then I wonder: Where do the fqdns in the certificate come from? I've never configured them anywhere.

jnummelin commented 3 weeks ago

then I wonder: Where do the fqdns in the certificate come from? I've never configured them anywhere.

That's a good question...

@kke If one configures the SSH address as node1.example.com on k0sctl yaml, is that copied over to the k0s.yaml on the nodes?

@pschichtel maybe check the k0sctl "managed" /etc/k0s/k0s.yaml on the nodes if you see the hostnames there?

pschichtel commented 3 weeks ago

yep, they're in there, ips and fqdns. So I guess k0sctl seems to be gathering the entries for sans.

pschichtel commented 3 weeks ago

Since I recently updated from k0sctl 0.18.1 to 0.19.x I just checked its history, this commit seems very applicable: https://github.com/k0sproject/k0sctl/commit/fd0ba50f48005339846446ffc2452db015e90272