sassoftware / viya4-iac-k8s

This project contains Terraform scripts to provision cloud infrastructure resources, when using vSphere, and Ansible to apply the needed elements of a Kubernetes cluster that are required to deploy SAS Viya platform product offerings.
Apache License 2.0
23 stars 15 forks source link

Problem metrics-server error: metrics not available yet #90

Closed eltan-ing closed 6 months ago

eltan-ing commented 9 months ago

I have an issue with the deployment of a bare-metal cluster using oss-k8s.sh. The deployment completes successfully, but when I run 'kubectl top nodes,' I encounter the error message 'error: metrics not available yet.'

Can you help me figure out what the issue might be? This could potentially cause problems with the deployment of Pods using HPA."

$ kubectl top nodes
error: metrics not available yet

$ kubectl get all -l app.kubernetes.io/name=metrics-server -n kube-system
NAME                                  READY   STATUS    RESTARTS   AGE
pod/metrics-server-84b8898677-mbjqt   1/1     Running   0          26m
pod/metrics-server-84b8898677-n8m5n   1/1     Running   0          26m
pod/metrics-server-84b8898677-zt6gr   1/1     Running   0          26m

NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/metrics-server   ClusterIP   10.96.71.122   <none>        443/TCP   26m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/metrics-server   3/3     3            3           26m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/metrics-server-84b8898677   3         3         3       26m
$
$ kubectl get nodes -o wide
NAME                  STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s-sasnode-cp        Ready    control-plane   37m   v1.26.6   192.168.31.30   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20
k8s-sasnode-system    Ready    <none>          37m   v1.26.6   192.168.31.35   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20
k8s-sasnode-worker1   Ready    <none>          37m   v1.26.6   192.168.31.31   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20
k8s-sasnode-worker2   Ready    <none>          37m   v1.26.6   192.168.31.32   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20
k8s-sasnode-worker3   Ready    <none>          37m   v1.26.6   192.168.31.33   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20
k8s-sasnode-worker4   Ready    <none>          37m   v1.26.6   192.168.31.34   <none>        Ubuntu 20.04.6 LTS   5.4.0-164-generic   containerd://1.6.20

: deploy

PLAY RECAP ***********************************************************************************************************************************************************************************************
k8s-sasnode-cp             : ok=53   changed=28   unreachable=0    failed=0    skipped=8    rescued=0    ignored=0
k8s-sasnode-nfs            : ok=15   changed=4    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
k8s-sasnode-system         : ok=46   changed=20   unreachable=0    failed=0    skipped=6    rescued=0    ignored=0
k8s-sasnode-worker1        : ok=48   changed=21   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
k8s-sasnode-worker2        : ok=48   changed=21   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
k8s-sasnode-worker3        : ok=48   changed=21   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
k8s-sasnode-worker4        : ok=48   changed=21   unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
localhost                  : ok=7    changed=6    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0

Playbook run took 0 days, 0 hours, 10 minutes, 44 seconds
miércoles 18 octubre 2023  17:23:59 +0000 (0:00:00.099)       0:10:44.693 *****
===============================================================================
kubernetes/common : Reboot machines to enable added items like cgroup for cri, etc. ------------------------------------------------------------------------------------------------------------- 345.42s
kubernetes/control_plane/init/primary : Run kubeadm init ----------------------------------------------------------------------------------------------------------------------------------------- 46.35s
kubernetes/metrics/metrics-server : Deploy metrics-server ---------------------------------------------------------------------------------------------------------------------------------------- 37.51s
kubernetes/toolbox : Update apt package index and install kubelet, kubeadm, kubectl -------------------------------------------------------------------------------------------------------------- 34.38s
kubernetes/common : Update OS -------------------------------------------------------------------------------------------------------------------------------------------------------------------- 21.28s
kubernetes/node/init : Join compute nodes to the cluster ----------------------------------------------------------------------------------------------------------------------------------------- 17.11s
kubernetes/storage/nfs-subdir-external-provisioner : Setting up default storage for the cluster using nfs-subdir-external-provisioner ------------------------------------------------------------ 14.81s
kubernetes/cri/containerd : Installing containerd.io --------------------------------------------------------------------------------------------------------------------------------------------- 12.84s
kubernetes/cri/containerd : Uninstall old Docker/Containerd versions ----------------------------------------------------------------------------------------------------------------------------- 12.37s
kubernetes/storage/sig-storage-local-static-provisioner : Cloning sig-storage-local-static-provisioner ------------------------------------------------------------------------------------------- 10.71s
kubernetes/common : Update GRUB ------------------------------------------------------------------------------------------------------------------------------------------------------------------- 9.60s
kubernetes/storage/sig-storage-local-static-provisioner : Setting up local storage for the cluster using sig-storage-local-static-provisioner ----------------------------------------------------- 4.38s
kubernetes/toolbox : Download crictl -------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.73s
Gathering Facts ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3.66s
kubernetes/common : Execute helm installation script ---------------------------------------------------------------------------------------------------------------------------------------------- 3.66s
kubernetes/common : Update limits to support SAS software ----------------------------------------------------------------------------------------------------------------------------------------- 3.44s
kubernetes/cni/calico : Install Operator ---------------------------------------------------------------------------------------------------------------------------------------------------------- 3.31s
kubernetes/toolbox : Install crictl --------------------------------------------------------------------------------------------------------------------------------------------------------------- 2.82s
kubernetes/node/baseline : Install nfs-common for nfs-subdir-external-provisioner ----------------------------------------------------------------------------------------------------------------- 2.66s
kubernetes/common : Install required packages for every machine ----------------------------------------------------------------------------------------------------------------------------------- 2.11s
jarpat commented 7 months ago

Hey @eltan-ing,

Are you still running into this issue? I've not been able to recreate this issue on my own setup using viya4-iac-k8s:3.6.0

$ kubectl top nodes
NAME                             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
jarpat-k1-oss-cas-01             64m          0%     1163Mi          0%        
jarpat-k1-oss-cas-02             62m          0%     1129Mi          0%        
jarpat-k1-oss-cas-03             52m          0%     1290Mi          1%        
jarpat-k1-oss-compute-01         66m          0%     1149Mi          0%        
jarpat-k1-oss-control-plane-01   271m         13%    1788Mi          46%       
jarpat-k1-oss-control-plane-02   208m         10%    1526Mi          40%       
jarpat-k1-oss-control-plane-03   171m         8%     1337Mi          35%       
jarpat-k1-oss-stateful-01        41m          0%     892Mi           2%        
jarpat-k1-oss-stateless-01       55m          0%     896Mi           2%        
jarpat-k1-oss-stateless-02       51m          0%     881Mi           2%        
jarpat-k1-oss-system-01          97m          1%     1074Mi          6%        

$ kubectl get all -l app.kubernetes.io/name=metrics-server -n kube-system
NAME                                  READY   STATUS    RESTARTS   AGE
pod/metrics-server-84b8898677-6flpb   1/1     Running   0          10m
pod/metrics-server-84b8898677-vdmzg   1/1     Running   0          10m
pod/metrics-server-84b8898677-wkhhn   1/1     Running   0          10m

NAME                     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/metrics-server   ClusterIP   10.43.94.64   <none>        443/TCP   10m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/metrics-server   3/3     3            3           10m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/metrics-server-84b8898677   3         3         3       10m

$ kubectl get nodes -o wide
NAME                             STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
jarpat-k1-oss-cas-01             Ready    <none>          11m   v1.26.7   10.12.32.239   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-cas-02             Ready    <none>          10m   v1.26.7   10.12.34.153   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-cas-03             Ready    <none>          11m   v1.26.7   10.12.38.230   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-compute-01         Ready    <none>          10m   v1.26.7   10.12.33.151   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-control-plane-01   Ready    control-plane   12m   v1.26.7   10.12.14.70    <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-control-plane-02   Ready    control-plane   11m   v1.26.7   10.12.34.22    <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-control-plane-03   Ready    control-plane   11m   v1.26.7   10.12.35.240   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-stateful-01        Ready    <none>          11m   v1.26.7   10.12.14.138   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-stateless-01       Ready    <none>          11m   v1.26.7   10.12.36.13    <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-stateless-02       Ready    <none>          10m   v1.26.7   10.12.14.73    <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat-k1-oss-system-01          Ready    <none>          11m   v1.26.7   10.12.12.242   <none>        Ubuntu 22.04.3 LTS   5.15.0-89-generic   containerd://1.6.20
jarpat commented 6 months ago

Marking as stale/inactive. If there are further questions please open a new GitHub issue.