k0sproject / k0sctl

A bootstrapping and management tool for k0s clusters.
Other
394 stars 77 forks source link

CoreDNS is not being deployed with dynamicConfig disabled #613

Open CmdrSharp opened 11 months ago

CmdrSharp commented 11 months ago

Before creating an issue, make sure you've checked the following:

Platform

Linux 5.15.142-flatcar #1 SMP Mon Dec 11 21:37:48 -00 2023 x86_64 GNU/Linux
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3602.2.3
VERSION_ID=3602.2.3
BUILD_ID=2023-12-11-2204
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3602.2.3 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3602.2.3:*:*:*:*:*:*:*"

Version

k0s 1.28.4 k0s 1.27.8

k0sctl version version: v0.16.0 commit: 7e8c272

Sysinfo

`k0s sysinfo`
Machine ID: "555fbefc839e690070cea6790c165890ed90f324fd3d148c6003df4bc94402fd" (from machine) (pass)
Total memory: 3.8 GiB (pass)
Disk space available for /var/lib/k0s: 42.0 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.15.142-flatcar (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (assumed) (pass)
    cgroup controller "freezer": available (assumed) (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: built-in (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

When installing a fresh cluster, I noticed that the CoreDNS pods are not being deployed. I SSH'd into each controller node and verified that the manifests exist under /var/lib/k0s/manifests/coredns/

$ ls -lhA /var/lib/k0s/manifests/coredns/
total 8.0K
-rw-r--r--. 1 root root 4.4K Dec 23 12:47 coredns.yaml

And if I manually apply this, it spins up fine. This issue coincided with me disabling dynamicConfig though (which I did because it failed to handle helm deployments properly - updates to the list of deployments didn't take effect). Upon resetting and enabling dynamicConfig again, they spin up fine.

I verified this behaviour across both 1.28.4 and 1.27.8. I also attempted waiting 12 hours to see if it was just delayed somehow - but CoreDNS did not deploy.

Pods with dynamicConfig: false:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
k0s-system    k0s-pushgateway-6c5d8c54cf-khmbv           1/1     Running   0          2m34s
kube-system   calico-kube-controllers-84c6cd5b85-mjllc   1/1     Running   0          2m34s
kube-system   calico-node-2z2nk                          1/1     Running   0          2m27s
kube-system   calico-node-4q2t8                          1/1     Running   0          2m22s
kube-system   calico-node-7mhgb                          1/1     Running   0          2m22s
kube-system   calico-node-hxwcq                          1/1     Running   0          2m27s
kube-system   konnectivity-agent-7tdwc                   1/1     Running   0          2m22s
kube-system   konnectivity-agent-kxkkn                   1/1     Running   0          2m27s
kube-system   konnectivity-agent-p8p9b                   1/1     Running   0          2m27s
kube-system   konnectivity-agent-xdbf2                   1/1     Running   0          2m22s
kube-system   kube-proxy-49756                           1/1     Running   0          2m27s
kube-system   kube-proxy-k2sdt                           1/1     Running   0          2m27s
kube-system   kube-proxy-kzpt5                           1/1     Running   0          2m22s
kube-system   kube-proxy-vrzxd                           1/1     Running   0          2m22s
kube-system   metrics-server-7556957bb7-88w9c            1/1     Running   0          2m34s
kube-system   nllb-sehar01-dev01-w01                     1/1     Running   0          76s
kube-system   nllb-sehar01-dev01-w02                     1/1     Running   0          79s
kube-system   nllb-sehar01-dev01-w03                     1/1     Running   0          72s
kube-system   nllb-sehar01-dev01-w04                     1/1     Running   0          79s
metallb       metallb-controller-5f9bb77dcd-kzp9j        1/1     Running   0          2m34s
metallb       metallb-speaker-46lhm                      4/4     Running   0          2m15s
metallb       metallb-speaker-dv2g6                      4/4     Running   0          2m9s
metallb       metallb-speaker-fp5nz                      4/4     Running   0          2m22s
metallb       metallb-speaker-xk89s                      4/4     Running   0          2m20s

Pods with dynamicConfig: true:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
k0s-system    k0s-pushgateway-6c5d8c54cf-qrtfd           1/1     Running   0          2m28s
kube-system   calico-kube-controllers-84c6cd5b85-7tn46   1/1     Running   0          2m28s
kube-system   calico-node-7xmqb                          1/1     Running   0          2m10s
kube-system   calico-node-9qwgt                          1/1     Running   0          2m10s
kube-system   calico-node-r2lcc                          1/1     Running   0          2m10s
kube-system   calico-node-vn2s4                          1/1     Running   0          2m5s
kube-system   coredns-85df575cdb-66wcw                   1/1     Running   0          2m28s
kube-system   coredns-85df575cdb-d2vh2                   1/1     Running   0          2m2s
kube-system   konnectivity-agent-dldcd                   1/1     Running   0          2m10s
kube-system   konnectivity-agent-qlm44                   1/1     Running   0          2m10s
kube-system   konnectivity-agent-sgxrb                   1/1     Running   0          2m5s
kube-system   konnectivity-agent-wt8mf                   1/1     Running   0          2m10s
kube-system   kube-proxy-8cz6f                           1/1     Running   0          2m10s
kube-system   kube-proxy-f2qqj                           1/1     Running   0          2m5s
kube-system   kube-proxy-hcnlj                           1/1     Running   0          2m10s
kube-system   kube-proxy-kphxv                           1/1     Running   0          2m10s
kube-system   metrics-server-7556957bb7-4pxwv            1/1     Running   0          2m20s
kube-system   nllb-sehar01-dev01-w01                     1/1     Running   0          48s
kube-system   nllb-sehar01-dev01-w02                     1/1     Running   0          58s
kube-system   nllb-sehar01-dev01-w03                     1/1     Running   0          57s
kube-system   nllb-sehar01-dev01-w04                     1/1     Running   0          40s
metallb       metallb-controller-5f9bb77dcd-8vfzk        1/1     Running   0          2m27s
metallb       metallb-speaker-9r2zt                      4/4     Running   0          2m5s
metallb       metallb-speaker-f78ft                      4/4     Running   0          2m3s
metallb       metallb-speaker-qdlp9                      4/4     Running   0          2m1s
metallb       metallb-speaker-sfbjp                      4/4     Running   0          117s

Steps to reproduce

  1. Install a k0s cluster with dynamicConfig: false

Expected behavior

CoreDNS should spin up.

Actual behavior

CoreDNS does not deploy to the cluster.

Additional context

k0sctl.yaml

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: my-cluster-name
spec:
  hosts:
  - ssh:
      address: 172.30.2.2
      user: user
      port: 22
      keyPath: path
    role: controller
    privateInterface: ens192
    installFlags:
    - --enable-metrics-scraper
  - ssh:
      address: 172.30.2.3
      user: user
      port: 22
      keyPath: path
    role: controller
    privateInterface: ens192
    installFlags:
    - --enable-metrics-scraper
  - ssh:
      address: 172.30.2.4
      user: user
      port: 22
      keyPath: path
    role: controller
    privateInterface: ens192
    installFlags:
    - --enable-metrics-scraper
  - ssh:
      address: 172.30.2.130
      user: user
      port: 22
      keyPath: path
    role: worker
    privateInterface: ens192
  - ssh:
      address: 172.30.2.131
      user: user
      port: 22
      keyPath: path
    role: worker
    privateInterface: ens192
  - ssh:
      address: 172.30.2.132
      user: user
      port: 22
      keyPath: path
    role: worker
    privateInterface: ens192
  - ssh:
      address: 172.30.2.133
      user: user
      port: 22
      keyPath: path
    role: worker
    privateInterface: ens192
  k0s:
    version: 1.28.4+k0s.0
    dynamicConfig: false
    config:
      spec:
        extensions:
          helm:
            repositories:
            - name: metallb
              url: https://metallb.github.io/metallb
            charts:
            - name: metallb
              chartname: metallb/metallb
              namespace: metallb
              order: 0
              values: |
                speaker:
                  logLevel: warn
        network:
          nodeLocalLoadBalancing:
            enabled: true
          provider: calico
          calico:
            envVars:
              FELIX_FEATUREDETECTOVERRIDE: ChecksumOffloadBroken=true
twz123 commented 10 months ago

Upon resetting and enabling dynamicConfig again, they spin up fine.

Did you reset the cluster before restarting it with dynamicConfig: false? Did you check the leading controller's logs about any errors when applying the CoreDNS stack? You should be able to re-trigger the application process without a controller restart by simply touching the coredns.yaml file.

it failed to handle helm deployments properly - updates to the list of deployments didn't take effect

Would you mind to file a separate issue about that?

CmdrSharp commented 10 months ago

@twz123 I did reset the cluster with each attempt. All the data I collected at the time is in the issue. I can find time to attempt to reproduce the issue within a week or so.

As for the other issue (related to helm) - since I switched off dynamicConfig, I haven't collected sufficient data to make a good error report.

twz123 commented 10 months ago

I can find time to attempt to reproduce the issue within a week or so.

Cool!

As for the other issue (related to helm) - since I switched off dynamicConfig, I haven't collected sufficient data to make a good error report.

Alright. Feel free to file another issue whenever it occurs again.