k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.79k stars 366 forks source link

API Service for Metric FailedDiscoveryCheck #4015

Closed jasase closed 9 months ago

jasase commented 9 months ago

Before creating an issue, make sure you've checked the following:

Platform

Linux 5.15.0-92-generic #102-Ubuntu SMP Wed Jan 10 09:33:48 UTC 2024 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.29.1+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "be83ab96870a117b0a2a0da296895d2f823f8930d2da786089290ff13749baf7" (from machine) (pass)
Total memory: 3.7 GiB (pass)
Disk space available for /var/lib/k0s: 424.3 GiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.15.0-92-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Created a fresh 3 node cluster, but when I checking the apiservices, the one for metrics are not available. Checked with kubectl get apiservice

NAME                                   SERVICE                      AVAILABLE                      AGE
v1.                                    Local                        True                           12m
v1.admissionregistration.k8s.io        Local                        True                           12m
v1.apiextensions.k8s.io                Local                        True                           12m
v1.apps                                Local                        True                           12m
v1.authentication.k8s.io               Local                        True                           12m
v1.authorization.k8s.io                Local                        True                           12m
v1.autoscaling                         Local                        True                           12m
v1.batch                               Local                        True                           12m
v1.certificates.k8s.io                 Local                        True                           12m
v1.coordination.k8s.io                 Local                        True                           12m
v1.discovery.k8s.io                    Local                        True                           12m
v1.events.k8s.io                       Local                        True                           12m
v1.flowcontrol.apiserver.k8s.io        Local                        True                           12m
v1.networking.k8s.io                   Local                        True                           12m
v1.node.k8s.io                         Local                        True                           12m
v1.policy                              Local                        True                           12m
v1.rbac.authorization.k8s.io           Local                        True                           12m
v1.scheduling.k8s.io                   Local                        True                           12m
v1.storage.k8s.io                      Local                        True                           12m
v1beta1.helm.k0sproject.io             Local                        True                           12m
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   12m
v1beta2.autopilot.k0sproject.io        Local                        True                           12m
v1beta3.flowcontrol.apiserver.k8s.io   Local                        True                           12m
v2.autoscaling                         Local                        True                           12m

Steps to reproduce

  1. Created a fresh cluster with 3 nodes with k0sctl apply

Config that was used to create cluster:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
    - ssh:
        address: n1-k0s.example.com
        user: xxx
        keyPath: xxx
      role: controller+worker
      noTaints: true
    - ssh:
        address: n2-k0s.example.com
        user: xxx
        keyPath: xxx
      role: controller+worker
      noTaints: true
    - ssh:
        address: n3-k0s.example.com
        user: xxx
        keyPath: xxx
      role: controller+worker
      noTaints: true
  k0s:
    version: 1.29.1+k0s.0
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: k0s-cluster

Expected behavior

All Kubernetes API Service should be available without limitations

Actual behavior

apiservices for metrics are not available

Screenshots and logs

No response

Additional context

No response

jasase commented 9 months ago

Syslog is flooded with this messages:

Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.113702    1536 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.109.122.144:443/apis/metrics.k8s.io/v1beta1: Get \"https://10.109.122.144:443/apis/metrics.k8s.io/v1beta1\": No agent available" component=kube-apiserver stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.119664    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=8934187525441318029" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.120097    1536 controller.go:146] Error updating APIService \"v1beta1.metrics.k8s.io\" with err: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: No agent available" component=kube-apiserver stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg=", Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]" component=kube-apiserver stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.120641    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=5563258231046833100" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.120741    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=860611279571955039" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.120793    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=1922455302067266170" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.120933    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=2609454271578293223" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.121002    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=7407836390960505655" component=konnectivity stream=stderr
Feb  2 21:45:09 n7-k0s k0s[1492]: time="2024-02-02 21:45:09" level=info msg="E0202 21:45:09.121031    2187 server.go:588] \"Failed to get a backend\" err=\"No agent available\" dialID=7344548162480458632" component=konnectivity stream=stderr
twz123 commented 9 months ago

Do you have a load balancer in place? For HA setups (i.e. setups with more than one controller), you either need an external load balancer or enable node-local load balancing.

/xref k0sproject/k0sctl#475

jasase commented 9 months ago

Thanks for the hint. That solved my problem. But didn't found any hint that this Load-Balancer config is necessary for multi controller setups.

twz123 commented 9 months ago

Where would you expect such a hint? The docs about control plane high availability state in their first sentence:

You can create high availability for the control plane by distributing the control plane across multiple nodes and installing a load balancer on top.

And further down on that page, in the "Load Balancer" section

Control plane high availability requires a tcp load balancer, which acts as a single point of contact to access the controllers.

Happy to add a note about that in other parts of the docs, as well. Ideally, k0sctl could warn about such things (see k0sproject/k0sctl#475).