rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.21k stars 580 forks source link

Metrics: unreachable kubernetes API (no logs) #3556

Closed GuillaumeDorschner closed 5 months ago

GuillaumeDorschner commented 5 months ago

RKE version: While running the rke setup we have this problem (@athomsAF)

reconcile] host [192.168.137.41] is a control plane node without reachable Kubernetes API endpoint in the cluster

Moreover, we also have the followings pods showing an error (sometimes) and if here is the error we can't see the logs:

rke version v1.5.6

Docker version: (docker version,docker info preferred)

docker version

Client: Docker Engine - Community
 Version:           26.0.0
 API version:       1.43 (downgraded from 1.45)
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:19:04 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:32 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    26.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.13.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.25.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 8
  Paused: 0
  Stopped: 4
 Images: 13
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-477.27.2.el8_8.x86_64
 Operating System: AlmaLinux 8.8 (Sapphire Caracal)
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 250.7GiB
 Name: localhost.localdomain
 ID: 2b020411-9c9e-4a17-a16c-2702b948f6eb
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  192.168.137.50:8082
  repo.labo.bi:8082
  127.0.0.0/8
 Registry Mirrors:
  http://repo.labo.bi:8082/
  http://192.168.137.50:8082/
 Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

NAME="AlmaLinux"
VERSION="8.8 (Sapphire Caracal)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="AlmaLinux 8.8 (Sapphire Caracal)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
ALMALINUX_MANTISBT_PROJECT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)

Bare-metal

cluster.yml file:

nodes:
  - address: ???
    hostname_override: master1
    user: k8s
    role: [controlplane, etcd]
  - address: ???
    hostname_override: master2
    user: k8s
    role: [controlplane, etcd]
  - address: ???
    hostname_override: master3
    user: k8s
    role: [controlplane, etcd]
  - address: ???
    hostname_override: worker1
    user: k8s
    role: [worker]
    labels:
      gpu: "true"
  - address: ???
    hostname_override: worker2
    user: k8s
    role: [worker]
    labels:
      gpu: "true"
  - address: ???
    hostname_override: worker3
    user: k8s
    role: [worker]
    labels:
      gpu: "true"
  - address: ???
    hostname_override: worker4
    user: k8s
    role: [worker]
    labels:
      gpu: "true"
  - address: ???
    hostname_override: worker5
    user: k8s
    role: [worker]

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

network:
  plugin: flannel

ingress:
  provider: nginx
  network_mode: none

addons: |-
  ---
  # certificat-manager

  ---
  # ingress-nginx
  apiVersion: v1
  kind: Service
  metadata:
    name: ingress-nginx-external
    namespace: ingress-nginx
  spec:
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 443
    selector:
      app: ingress-nginx
      app.kubernetes.io/instance: ingress-nginx
    sessionAffinity: None
    type: LoadBalancer

Steps to Reproduce:

To do in the terminal

rm -rf ~/.kube/config rke remove --config ./cluster.yml --force rke up --config ./cluster.yml cp kube_config_cluster.yml ~/.kube/config k9s

Results: In k9s I get this

 Context: labo-cluster                             <0> all       <a>      Attach     <l>       Logs            <f> Show PortForward                                                                                                        ____  __.________
 Cluster: labo-cluster                             <1> default   <ctrl-d> Delete     <p>       Logs Previous   <t> Transfer                                                                                                               |    |/ _/   __   \______
 User:    kube-admin-labo-cluster                                <d>      Describe   <shift-f> Port-Forward    <y> YAML                                                                                                                   |      < \____    /  ___/
 K9s Rev: v0.32.4                                                <e>      Edit       <z>       Sanitize                                                                                                                                   |    |  \   /    /\___ \
 K8s Rev: v1.27.11                                               <?>      Help       <s>       Shell                                                                                                                                      |____|__ \ /____//____  >
 CPU:     n/a                                                    <ctrl-k> Kill       <o>       Show Node                                                                                                                                          \/            \/
 MEM:     n/a
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Pods(all)[25] ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ NAMESPACE↑                    NAME                                                    PF                READY                STATUS                                    RESTARTS IP                              NODE                     AGE                     │
│ ingress-nginx                 ingress-nginx-admission-create-hrlpz                    ●                 0/1                  Completed                                        0 10.42.4.5                       worker2                  2m1s                    │
│ ingress-nginx                 ingress-nginx-admission-patch-8f79j                     ●                 0/1                  Completed                                        1 10.42.6.2                       worker3                  2m1s                    │
│ ingress-nginx                 nginx-ingress-controller-4srzx                          ●                 0/1                  Running                                          0 10.42.7.3                       worker4                  2m1s                    │
│ ingress-nginx                 nginx-ingress-controller-gr95x                          ●                 1/1                  Running                                          0 10.42.5.2                       worker1                  2m1s                    │
│ ingress-nginx                 nginx-ingress-controller-h9plc                          ●                 1/1                  Running                                          0 10.42.3.3                       worker5                  2m1s                    │
│ ingress-nginx                 nginx-ingress-controller-mr4bc                          ●                 1/1                  Running                                          0 10.42.6.3                       worker3                  2m1s                    │
│ ingress-nginx                 nginx-ingress-controller-tghc5                          ●                 1/1                  Running                                          0 10.42.4.6                       worker2                  2m1s                    │
│ kube-system                   coredns-5848d49475-hcffz                                ●                 1/1                  Running                                          0 10.42.3.2                       worker5                  2m12s                   │
│ kube-system                   coredns-5848d49475-q7f6q                                ●                 1/1                  Running                                          0 10.42.7.2                       worker4                  2m10s                   │
│ kube-system                   coredns-5848d49475-qg4pf                                ●                 1/1                  Running                                          0 10.42.4.3                       worker2                  2m10s                   │
│ kube-system                   coredns-autoscaler-77f6844bcc-nz4ll                     ●                 1/1                  Running                                          0 10.42.4.2                       worker2                  2m12s                   │
│ kube-system                   kube-flannel-6wgg9                                      ●                 2/2                  Running                                          0 192.168.137.41                  master1                  2m17s                   │
│ kube-system                   kube-flannel-8h49j                                      ●                 2/2                  Running                                          0 192.168.137.64                  worker4                  2m17s                   │
│ kube-system                   kube-flannel-f8tqd                                      ●                 2/2                  Running                                          0 192.168.137.65                  worker5                  2m17s                   │
│ kube-system                   kube-flannel-gqfdn                                      ●                 2/2                  Running                                          0 192.168.137.62                  worker2                  2m17s                   │
│ kube-system                   kube-flannel-njj4h                                      ●                 2/2                  Running                                          0 192.168.137.63                  worker3                  2m17s                   │
│ kube-system                   kube-flannel-pczrp                                      ●                 2/2                  Running                                          1 192.168.137.42                  master2                  2m17s                   │
│ kube-system                   kube-flannel-qksnp                                      ●                 2/2                  Running                                          0 192.168.137.61                  worker1                  2m17s                   │
│ kube-system                   kube-flannel-sjrnb                                      ●                 2/2                  Running                                          0 192.168.137.43                  master3                  2m17s                   │
│ kube-system                   metrics-server-55fb56d747-mkzmw                         ●                 1/1                  Running                                          0 10.42.4.4                       worker2                  2m7s                    │
│ kube-system                   rke-coredns-addon-deploy-job-kjlll                      ●                 0/1                  Completed                                        0 192.168.137.41                  master1                  2m13s                   │
│ kube-system                   rke-ingress-controller-deploy-job-m6q5c                 ●                 0/1                  Completed                                        0 192.168.137.41                  master1                  2m3s                    │
│ kube-system                   rke-metrics-addon-deploy-job-w4l29                      ●                 0/1                  Completed                                        0 192.168.137.41                  master1                  2m8s                    │
│ kube-system                   rke-network-plugin-deploy-job-c5jm5                     ●                 0/1                  Completed                                        0 192.168.137.41                  master1                  2m18s                   │
│ kube-system                   rke-user-addon-deploy-job-g9rnr                         ●                 0/1                  Error                                            0 192.168.137.41                  master1                  118s                    │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
│                                                                                                                                                                                                                                                                  │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  <pod>

in the jobs rke-user-addon-deployment i have this (very strange)

┌─────────────────────────────────────────────────────────────────────────────────────────── Logs(kube-system/rke-user-addon-deploy-job-g9rnr:rke-user-addon-pod)[tail] ───────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                                Autoscroll:On      FullScreen:Off     Timestamps:Off     Wrap:Off                                                                                                 │
│ error: You must be logged in to the server (the server has asked for the client to provide credentials)                                                                                                                                                          │
│ Stream closed EOF for kube-system/rke-user-addon-deploy-job-g9rnr (rke-user-addon-pod)                                                                                                                                                                           │

and in the metrics server i have this


│ E0415 14:51:31.215141       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.42:10250/metrics/resource\": dial tcp 192.168.137.42:10250: connect: no route to host" node="master2"                                                       │
│ E0415 14:51:33.263196       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.63:10250/metrics/resource\": dial tcp 192.168.137.63:10250: connect: no route to host" node="worker3"                                                       │
│ E0415 14:51:33.263195       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.65:10250/metrics/resource\": dial tcp 192.168.137.65:10250: connect: no route to host" node="worker5"                                                       │
│ E0415 14:51:37.295226       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.41:10250/metrics/resource\": dial tcp 192.168.137.41:10250: connect: no route to host" node="master1"                                                       │
│ E0415 14:51:37.295302       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.64:10250/metrics/resource\": dial tcp 192.168.137.64:10250: connect: no route to host" node="worker4"                                                       │
│ E0415 14:51:37.359145       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.43:10250/metrics/resource\": dial tcp 192.168.137.43:10250: connect: no route to host" node="master3"                                                       │
│ E0415 14:51:37.359151       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.61:10250/metrics/resource\": dial tcp 192.168.137.61:10250: connect: no route to host" node="worker1"                                                       │
│ E0415 14:51:46.191171       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.43:10250/metrics/resource\": dial tcp 192.168.137.43:10250: connect: no route to host" node="master3"                                                       │
│ E0415 14:51:48.239178       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.63:10250/metrics/resource\": dial tcp 192.168.137.63:10250: connect: no route to host" node="worker3"                                                       │
│ E0415 14:51:48.239298       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.42:10250/metrics/resource\": dial tcp 192.168.137.42:10250: connect: no route to host" node="master2"                                                       │
│ E0415 14:51:52.335177       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.137.64:10250/metrics/resource\": dial tcp 192.168.137.64:10250: connect: no route to host" node="worker4"                                                       │
`
``
GuillaumeDorschner commented 5 months ago

We found the problem; it was the firewall blocking the traffic on the port. Also, I respected the documentation, but maybe the doc needs an update? Currently, I will just disable the firewall.