k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.8k stars 368 forks source link

k0s doesn't fail gracefully on broken kine config #4693

Closed ianb-mp closed 5 days ago

ianb-mp commented 4 months ago

Before creating an issue, make sure you've checked the following:

Platform

Linux 6.1.0-13-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.30.2+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 1.9 GiB (pass)
Disk space available for /var/lib/k0s: 4.4 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-13-cloud-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I ran k0sctl apply with storage set to use kine, however messed up kine dataSource (I forgot to set environment variables to populate user,password,host parameters).

This caused a few issues:

The k0scontroller log shows kine trying to reach the backend. I see this log message repeated:

Jun 27 22:13:40 ip-172-30-3-34 k0s[32970]: time="2024-06-27 22:13:40" level=info msg="time=\"2024-06-27T22:13:40.440833676Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= database=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr

To recover, I took these steps:

It would be good for k0s to fail more gracefully e.g. could a smoke test be done with kine prior to deploying k0s?

Steps to reproduce

  1. Install k0s with kine backend configured, but supply broken dataSource e.g. dataSource: "postgres://:@:5432/kine?sslmode=prefer"

Expected behavior

No response

Actual behavior

No response

Screenshots and logs

k0scontroller log ``` Jun 28 01:17:14 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:14" level=info msg="time=\"2024-06-28T01:17:14.837648006Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= data base=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:15 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:15" level=info msg="time=\"2024-06-28T01:17:15.837970611Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= data base=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:16 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:16" level=info msg="time=\"2024-06-28T01:17:16.838286009Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= data base=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:17 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:17" level=info msg="time=\"2024-06-28T01:17:17.83857765Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= datab ase=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:18 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:18" level=info msg="time=\"2024-06-28T01:17:18.838912455Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= data base=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="time=\"2024-06-28T01:17:19.839053493Z\" level=info msg=\"Configuring pgx database connection pooling: maxIdleConns=2, maxOpenCo nns=0, connMaxLifetime=0s\"" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="panic: runtime error: invalid memory address or nil pointer dereference" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x578460]" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="goroutine 1 [running]:" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="database/sql.(*DB).SetMaxIdleConns(0x1c21ae0?, 0x4?)" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/usr/local/go/src/database/sql/sql.go:981 +0x20" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/k3s-io/kine/pkg/drivers/generic.configureConnectionPooling({0xc0004ad550?, 0xc0004ad4b8?, 0x1284c7b?}, 0x0, {0x1263e d1?, 0x1?})" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/pkg/drivers/generic/generic.go:153 +0x119" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/k3s-io/kine/pkg/drivers/generic.Open({0x13fa828, 0xc000100fa0}, {0x1263ed1, 0x3}, {0xc000118500, 0x26}, {0xc00014136 0?, 0xc00046d750?, 0x458c49?}, {0x13e3de0, ...}, ...)" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/pkg/drivers/generic/generic.go:194 +0x279" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/k3s-io/kine/pkg/drivers/pgsql.New({0x13fa828, 0xc000100fa0}, {0x7ffcc82baea4?, 0x0?}, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}}, ...)" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/pkg/drivers/pgsql/pgsql.go:66 +0x165" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/k3s-io/kine/pkg/endpoint.getKineStorageBackend({0x13fa828?, 0xc000100fa0?}, {0x7ffcc82bae99?, 0x477b72?}, {0x7ffcc82 baea4?, 0x44b62b?}, {0x0, {0x7ffcc82baed1, 0x23}, {0x7ffcc82bae99, ...}, ...})" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/pkg/endpoint/endpoint.go:220 +0x250" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/k3s-io/kine/pkg/endpoint.Listen({_, _}, {0x0, {0x7ffcc82baed1, 0x23}, {0x7ffcc82bae99, 0x26}, {0x0, 0x0, 0x0}, ...}) " component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/pkg/endpoint/endpoint.go:67 +0x194" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="main.run(0xc0002b6500)" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/main.go:129 +0x2b2" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/urfave/cli/v2.(*Command).Run(0xc00017bb80, 0xc0002b6500, {0xc000134000, 0x4, 0x4})" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/command.go:276 +0x97d" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/urfave/cli/v2.(*App).RunContext(0xc0000ae200, {0x13fa7b8, 0x1d0c9a0}, {0xc000134000, 0x4, 0x4})" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/app.go:333 +0x58b" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="github.com/urfave/cli/v2.(*App).Run(...)" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/app.go:307" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="main.main()" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="\t/kine/main.go:109 +0x793" component=kine stream=stderr Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=warning msg="Failed to wait for process" component=kine error="exit status 2" Jun 28 01:17:19 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:19" level=info msg="respawning in 5s" component=kine Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=debug msg="Probing components" component=prober Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=debug msg=respawning component=kine Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="Restarted (37)" component=kine Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="time=\"2024-06-28T01:17:24.859282697Z\" level=info msg=\"metrics server is starting to listen at :2380\"" component=kine stream=stderr Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="time=\"2024-06-28T01:17:24.860043135Z\" level=info msg=\"starting metrics server path /metrics\"" component=kine stream=stderr Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="time=\"2024-06-28T01:17:24.860092685Z\" level=warning msg=\"failed to check existence of database kine, going to attempt create: failed to connect to `host=/tmp user= database=postgres`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="time=\"2024-06-28T01:17:24.860311537Z\" level=warning msg=\"failed to create database kine: failed to connect to `host=/tmp user= database=postgres`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:24 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:24" level=info msg="time=\"2024-06-28T01:17:24.860481155Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= database=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:25 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:25" level=info msg="time=\"2024-06-28T01:17:25.860850739Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= database=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr Jun 28 01:17:26 ip-172-30-3-34 k0s[32970]: time="2024-06-28 01:17:26" level=info msg="time=\"2024-06-28T01:17:26.86123451Z\" level=error msg=\"failed to ping connection: failed to connect to `host=/tmp user= database=kine`: dial error (dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory)\"" component=kine stream=stderr ```
k0sctl apply output (partial) ``` NFO[0022] ==> Running phase: Initialize the k0s cluster INFO[0022] [ssh] 172.30.3.34:22: installing k0s controller DEBU[0022] [ssh] 172.30.3.34:22: executing `sudo -s -- /usr/local/bin/k0s install controller --debug --enable-dynamic-config --data-dir=/var/lib/k0s --config "/etc/k0s/k0s.yaml"` DEBU[0024] [ssh] 172.30.3.34:22: executing `sudo -s -- stat /run/systemd/system` DEBU[0024] [ssh] 172.30.3.34:22: File: /run/systemd/system DEBU[0024] [ssh] 172.30.3.34:22: Size: 80 Blocks: 0 IO Block: 4096 directory DEBU[0024] [ssh] 172.30.3.34:22: Device: 0,23 Inode: 224 Links: 3 DEBU[0024] [ssh] 172.30.3.34:22: Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) DEBU[0024] [ssh] 172.30.3.34:22: Access: 2024-06-27 22:09:09.914215486 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Modify: 2024-06-27 22:09:09.858215122 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Change: 2024-06-27 22:09:09.858215122 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Birth: 2024-06-19 05:28:09.584000037 +0000 DEBU[0024] [ssh] 172.30.3.34:22: executing `sudo -s -- systemctl start k0scontroller 2> /dev/null` INFO[0024] [ssh] 172.30.3.34:22: waiting for the k0s service to start DEBU[0024] [ssh] 172.30.3.34:22: executing `sudo -s -- stat /run/systemd/system` DEBU[0024] [ssh] 10.2.5.21:22: executing command DEBU[0024] [ssh] 172.30.3.34:22: File: /run/systemd/system DEBU[0024] [ssh] 172.30.3.34:22: Size: 80 Blocks: 0 IO Block: 4096 directory DEBU[0024] [ssh] 172.30.3.34:22: Device: 0,23 Inode: 224 Links: 3 DEBU[0024] [ssh] 172.30.3.34:22: Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) DEBU[0024] [ssh] 172.30.3.34:22: Access: 2024-06-27 22:09:09.914215486 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Modify: 2024-06-27 22:09:09.858215122 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Change: 2024-06-27 22:09:09.858215122 +0000 DEBU[0024] [ssh] 172.30.3.34:22: Birth: 2024-06-19 05:28:09.584000037 +0000 DEBU[0024] [ssh] 172.30.4.164:22: executing command DEBU[0024] [ssh] 172.30.3.34:22: executing command DEBU[0024] [ssh] 10.2.5.20:22: executing command DEBU[0024] [ssh] 172.30.3.34:22: executing `sudo -s -- systemctl status k0scontroller 2> /dev/null | grep -q "(running)"` DEBU[0024] [ssh] 10.2.5.21:22: executing command DEBU[0024] [ssh] 10.2.5.20:22: executing command DEBU[0024] [ssh] 172.30.4.164:22: executing command DEBU[0024] [ssh] 172.30.3.34:22: executing command INFO[0024] [ssh] 172.30.3.34:22: waiting for kubernetes api to respond DEBU[0025] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0025] [ssh] 172.30.3.34:22: 000 DEBU[0030] retrying, attempt 1 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0030] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0030] [ssh] 172.30.3.34:22: 000 DEBU[0034] [ssh] 10.2.5.21:22: executing command DEBU[0034] [ssh] 172.30.3.34:22: executing command DEBU[0034] [ssh] 172.30.4.164:22: executing command DEBU[0034] [ssh] 10.2.5.20:22: executing command DEBU[0034] [ssh] 10.2.5.21:22: executing command DEBU[0034] [ssh] 10.2.5.20:22: executing command DEBU[0034] [ssh] 172.30.3.34:22: executing command DEBU[0034] [ssh] 172.30.4.164:22: executing command DEBU[0035] retrying, attempt 2 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0035] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0035] [ssh] 172.30.3.34:22: 000 DEBU[0040] retrying, attempt 3 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0040] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0040] [ssh] 172.30.3.34:22: 000 DEBU[0044] [ssh] 10.2.5.21:22: executing command DEBU[0044] [ssh] 172.30.4.164:22: executing command DEBU[0044] [ssh] 172.30.3.34:22: executing command DEBU[0044] [ssh] 10.2.5.20:22: executing command DEBU[0044] [ssh] 10.2.5.21:22: executing command DEBU[0044] [ssh] 10.2.5.20:22: executing command DEBU[0044] [ssh] 172.30.4.164:22: executing command DEBU[0045] [ssh] 172.30.3.34:22: executing command DEBU[0045] retrying, attempt 4 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0045] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0045] [ssh] 172.30.3.34:22: 000 DEBU[0050] retrying, attempt 5 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0050] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0050] [ssh] 172.30.3.34:22: 000 DEBU[0054] [ssh] 10.2.5.21:22: executing command DEBU[0054] [ssh] 172.30.3.34:22: executing command DEBU[0054] [ssh] 172.30.4.164:22: executing command DEBU[0054] [ssh] 10.2.5.20:22: executing command DEBU[0054] [ssh] 10.2.5.21:22: executing command DEBU[0054] [ssh] 10.2.5.20:22: executing command DEBU[0054] [ssh] 172.30.3.34:22: executing command DEBU[0054] [ssh] 172.30.4.164:22: executing command DEBU[0055] retrying, attempt 6 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0055] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0055] [ssh] 172.30.3.34:22: 000 DEBU[0060] retrying, attempt 7 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0060] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0060] [ssh] 172.30.3.34:22: 000 DEBU[0064] [ssh] 10.2.5.21:22: executing command DEBU[0064] [ssh] 172.30.4.164:22: executing command DEBU[0064] [ssh] 172.30.3.34:22: executing command DEBU[0064] [ssh] 10.2.5.20:22: executing command DEBU[0064] [ssh] 10.2.5.21:22: executing command DEBU[0064] [ssh] 10.2.5.20:22: executing command DEBU[0064] [ssh] 172.30.4.164:22: executing command DEBU[0064] [ssh] 172.30.3.34:22: executing command DEBU[0065] retrying, attempt 8 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0065] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0065] [ssh] 172.30.3.34:22: 000 DEBU[0070] retrying, attempt 9 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0070] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0070] [ssh] 172.30.3.34:22: 000 DEBU[0074] [ssh] 10.2.5.21:22: executing command DEBU[0074] [ssh] 172.30.3.34:22: executing command DEBU[0074] [ssh] 172.30.4.164:22: executing command DEBU[0074] [ssh] 10.2.5.20:22: executing command DEBU[0074] [ssh] 10.2.5.21:22: executing command DEBU[0074] [ssh] 10.2.5.20:22: executing command DEBU[0074] [ssh] 172.30.4.164:22: executing command DEBU[0074] [ssh] 172.30.3.34:22: executing command DEBU[0075] retrying, attempt 10 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0075] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0075] [ssh] 172.30.3.34:22: 000 DEBU[0080] retrying, attempt 11 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0080] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0080] [ssh] 172.30.3.34:22: 000 DEBU[0084] [ssh] 10.2.5.21:22: executing command DEBU[0084] [ssh] 172.30.3.34:22: executing command DEBU[0084] [ssh] 172.30.4.164:22: executing command DEBU[0084] [ssh] 10.2.5.20:22: executing command DEBU[0084] [ssh] 10.2.5.21:22: executing command DEBU[0084] [ssh] 10.2.5.20:22: executing command DEBU[0084] [ssh] 172.30.3.34:22: executing command DEBU[0084] [ssh] 172.30.4.164:22: executing command DEBU[0085] retrying, attempt 12 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0085] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0085] [ssh] 172.30.3.34:22: 000 DEBU[0090] retrying, attempt 13 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0090] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0090] [ssh] 172.30.3.34:22: 000 DEBU[0094] [ssh] 10.2.5.21:22: executing command DEBU[0094] [ssh] 172.30.3.34:22: executing command DEBU[0094] [ssh] 172.30.4.164:22: executing command DEBU[0094] [ssh] 10.2.5.20:22: executing command DEBU[0094] [ssh] 10.2.5.21:22: executing command DEBU[0094] [ssh] 10.2.5.20:22: executing command DEBU[0094] [ssh] 172.30.3.34:22: executing command DEBU[0094] [ssh] 172.30.4.164:22: executing command DEBU[0095] retrying, attempt 14 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0095] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0095] [ssh] 172.30.3.34:22: 000 DEBU[0100] retrying, attempt 15 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0100] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0100] [ssh] 172.30.3.34:22: 000 DEBU[0104] [ssh] 10.2.5.21:22: executing command DEBU[0104] [ssh] 172.30.3.34:22: executing command DEBU[0104] [ssh] 172.30.4.164:22: executing command DEBU[0104] [ssh] 10.2.5.20:22: executing command DEBU[0104] [ssh] 10.2.5.21:22: executing command DEBU[0104] [ssh] 10.2.5.20:22: executing command DEBU[0104] [ssh] 172.30.3.34:22: executing command DEBU[0104] [ssh] 172.30.4.164:22: executing command DEBU[0105] retrying, attempt 16 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0105] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0105] [ssh] 172.30.3.34:22: 000 DEBU[0110] retrying, attempt 17 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0110] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0110] [ssh] 172.30.3.34:22: 000 DEBU[0114] [ssh] 10.2.5.21:22: executing command DEBU[0114] [ssh] 172.30.4.164:22: executing command DEBU[0114] [ssh] 172.30.3.34:22: executing command DEBU[0114] [ssh] 10.2.5.20:22: executing command DEBU[0114] [ssh] 10.2.5.21:22: executing command DEBU[0114] [ssh] 10.2.5.20:22: executing command DEBU[0114] [ssh] 172.30.3.34:22: executing command DEBU[0114] [ssh] 172.30.4.164:22: executing command DEBU[0115] retrying, attempt 18 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0115] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0115] [ssh] 172.30.3.34:22: 000 DEBU[0120] retrying, attempt 19 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0120] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0120] [ssh] 172.30.3.34:22: 000 DEBU[0124] [ssh] 10.2.5.21:22: executing command DEBU[0124] [ssh] 172.30.3.34:22: executing command DEBU[0124] [ssh] 172.30.4.164:22: executing command DEBU[0124] [ssh] 10.2.5.20:22: executing command DEBU[0124] [ssh] 10.2.5.21:22: executing command DEBU[0124] [ssh] 10.2.5.20:22: executing command DEBU[0124] [ssh] 172.30.3.34:22: executing command DEBU[0124] [ssh] 172.30.4.164:22: executing command DEBU[0125] retrying, attempt 20 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0125] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0125] [ssh] 172.30.3.34:22: 000 DEBU[0130] retrying, attempt 21 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0130] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0130] [ssh] 172.30.3.34:22: 000 DEBU[0134] [ssh] 10.2.5.21:22: executing command DEBU[0134] [ssh] 172.30.3.34:22: executing command DEBU[0134] [ssh] 172.30.4.164:22: executing command DEBU[0134] [ssh] 10.2.5.20:22: executing command DEBU[0134] [ssh] 10.2.5.21:22: executing command DEBU[0134] [ssh] 10.2.5.20:22: executing command DEBU[0134] [ssh] 172.30.3.34:22: executing command DEBU[0134] [ssh] 172.30.4.164:22: executing command DEBU[0135] retrying, attempt 22 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0135] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0135] [ssh] 172.30.3.34:22: 000 DEBU[0140] retrying, attempt 23 - last error: command failed: client exec: ssh session wait: Process exited with status 7 DEBU[0140] [ssh] 172.30.3.34:22: executing `curl -kso /dev/null --connect-timeout 20 -w "%{http_code}" "https://localhost:6443/version"` DEBU[0140] [ssh] 172.30.3.34:22: 000 DEBU[0144] [ssh] 10.2.5.21:22: executing command DEBU[0144] [ssh] 172.30.4.164:22: executing command DEBU[0144] [ssh] 172.30.3.34:22: executing command DEBU[0144] [ssh] 10.2.5.20:22: executing command DEBU[0144] [ssh] 10.2.5.21:22: executing command DEBU[0144] [ssh] 10.2.5.20:22: executing command DEBU[0144] [ssh] 172.30.3.34:22: executing command DEBU[0144] [ssh] 172.30.4.164:22: executing command INFO[0144] * Running clean-up for phase: Acquire exclusive host lock DEBU[0145] [ssh] 10.2.5.21:22: executing `sudo -s -- rm -f -- /run/lock/k0sctl 2> /dev/null` DEBU[0145] [ssh] 172.30.3.34:22: executing `sudo -s -- rm -f -- /run/lock/k0sctl 2> /dev/null` DEBU[0145] [ssh] 172.30.4.164:22: executing `sudo -s -- rm -f -- /run/lock/k0sctl 2> /dev/null` DEBU[0145] [ssh] 10.2.5.20:22: executing `sudo -s -- rm -f -- /run/lock/k0sctl 2> /dev/null` INFO[0145] * Running clean-up for phase: Initialize the k0s cluster INFO[0145] [ssh] 172.30.3.34:22: cleaning up DEBU[0145] [ssh] 172.30.3.34:22: executing `sudo -s -- /usr/local/bin/k0s reset --data-dir=/var/lib/k0s` ^C ```

Additional context

No response

jnummelin commented 4 months ago

could a smoke test be done with kine prior to deploying k0s?

It has to be done by k0s itself. The only sufficient way I can think of is to utilize the etcd API kine provides to do some sort of ping-pong. Maybe try to write some arbitrary value and see it succeed/fail.

The challenge still would be how to convey that to k0sctl side. I mean k0sctl does not really know what k0s is doing per se, it just sees it getting up or not via systemd or other init system.

ianb-mp commented 4 months ago

In the controller log I notice the kine process is panicking. Does the kine process get restarted by k0s at this point? If so, could k0s do a simple check: is this the first time kine was run, and if so then don't restart but instead quit with an error.

jnummelin commented 4 months ago

In the controller log I notice the kine process is panicking. Does the kine process get killed and restarted by k0s at this point?

Yes, k0s acts as a watchdog for kine and all other child processes

could k0s do a simple check: is this the first time kine was run, and if so then don't restart but instead quit with an error.

I think in general if k0s does not get kine up-and-running, it should eventually give up and quit with error. But then again, k0s is usually watched by systemd or other init system and will get restarted itself.

github-actions[bot] commented 2 months ago

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] commented 1 month ago

The issue is marked as stale since no activity has been recorded in 30 days

github-actions[bot] commented 1 week ago

The issue is marked as stale since no activity has been recorded in 30 days