kurokobo / awx-on-k3s

An example implementation of AWX on single node K3s using AWX Operator, with easy-to-use simplified configuration with ownership of data and passwords.
MIT License
518 stars 143 forks source link

Unable to manage deployment due to kubectl Unauthorized #352

Closed 09cicada closed 2 months ago

09cicada commented 2 months ago

Environment

K3S version: v1.25.4+k3s1 (0dc63334)

Description

Hello Mr Kurokobo. I have an odd issue in that I cannot manage the AWX namespace any longer. When I run kubectl, I get this error.

error: You must be logged in to the server (Unauthorized)

Step to Reproduce

Any attempt to run kubectl -n awx get all results in the error, any attempt to pull pod logs with kubectl results in the same error. I have read a few links that point to certificate expiry but I am not sure how to troubleshoot.

Some additional info. The AWX environment still functions although very slowly and crashes intermittently. I am hoping this is something simple.

Thank you

Logs

$ kubectl ...
...

Files

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
...
kurokobo commented 2 months ago

@09cicada Hi, could you provide the output from following commands?

which kubectl
ls -l $(which kubectl)
ls -l ~/.kube/config

Also please check if following command works.

kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get pods --all-namespaces

Some additional info. The AWX environment still functions although very slowly and crashes intermittently. I am hoping this is something simple.

It is not usual. The OS may be under heavy load or storage may be strained, or any other reasons. Do the top, free, df commands show any abnormalities in resoure usages? Does the situation remain the same after restarting the OS?

09cicada commented 2 months ago

Hello, please see my output below and thank you.

ansible ~]# ls -l which kubectl -rwxr-xr-x. 1 root root 45015040 Nov 24 2022 /usr/local/bin/kubectl

ansible ~]# ls -l $(which kubectl) -rwxr-xr-x. 1 root root 45015040 Nov 24 2022 /usr/local/bin/kubectl

ansible ~]# ls -l ~/.kube/config -rw-------. 1 root root 2980 Nov 24 2022 /root/.kube/config

This is promising in comparison

ansible ~]# kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system helm-install-traefik-crd-6vcw6 0/1 Completed 0 522d kube-system helm-install-traefik-9df9g 0/1 Completed 1 522d awx awx-task-5fbddc54d7-t662w 0/4 ContainerStatusUnknown 8 (156d ago) 313d awx awx-task-5fbddc54d7-rk9qn 0/4 ContainerStatusUnknown 4 153d awx awx-web-f89895997-mxpr6 0/3 ContainerStatusUnknown 7 (156d ago) 313d awx awx-operator-controller-manager-d5c594f54-z5nbz 0/2 ContainerStatusUnknown 6 (151d ago) 313d kube-system local-path-provisioner-79f67d76f8-nkqtk 1/1 Running 2 (156d ago) 522d kube-system svclb-traefik-27171d22-jgtlq 2/2 Running 0 151d awx awx-postgres-13-0 1/1 Running 0 151d awx awx-task-5fbddc54d7-7t5c6 4/4 Running 0 151d awx awx-web-f89895997-7gbv2 3/3 Running 0 151d kube-system coredns-597584b69b-vflzd 1/1 Running 2 (156d ago) 522d kube-system traefik-bb69b68cd-8pt7r 1/1 Running 2 (156d ago) 522d awx awx-operator-controller-manager-d5c594f54-2trtw 2/2 Running 14 (13d ago) 151d kube-system metrics-server-5c8978b444-6fd22 1/1 Running 2 (156d ago) 522d

The system is not under heavy load, the "kubectl Unauthorized" issue remains after reboot. It looks like maybe my .config file?

Thank you

09cicada commented 2 months ago

Hello again,

As a test, I backed up the .kube/config file, I then took the certificate data from /etc/rancher/k3s/k3s.yaml file and replaced the .kube/config certificate data with it. Now my kubectl commands work fine again. It seems to have fixed the issue although I am not sure whether this is appropriate or not.

Thank you

kurokobo commented 2 months ago

@09cicada Hi, thanks for updating.

I then took the certificate data from /etc/rancher/k3s/k3s.yaml file and replaced the .kube/config certificate data with it.

This is the almost correct approach.

Technical background:

09cicada commented 2 months ago

@kurokobo, Excellent, thank you much.

In the past kubectl worked so I must have updated or installed something along the way to cause the issue.
I think I will opt for the symbolic link method. I was also able to delete the rogue awx pods after you helped me to fix kubectl.

One last question, considering that we do not have a full Kubernetes environment. In your opinion, is running a full production AWX instance supporting 1000 or so hosts on K3s possible assuming we throw enough CPU/RAM at the underlying host?

Thank you

kurokobo commented 2 months ago

@09cicada Also appending export KUBECONFIG=/etc/rancher/k3s/k3s.yaml to your .bashrc may be possible solution for you. Refer to the details about kubeconfig files on the official kubernetes docs: https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/

One last question, considering that we do not have a full Kubernetes environment. In your opinion, is running a full production AWX instance supporting 1000 or so hosts on K3s possible assuming we throw enough CPU/RAM at the underlying host?

I have no concerns about choosing K3s, but for AWX, it depends. The frequency of job execution is a more demanding factor than the total number of hosts in the inventory. If jobs are not executed so frequently, single node K3s may possibly work if there are sufficient compute resources, but if they are executed frequently, it would be more stable to increase the number of replicas of task pod in a multi-node K3s cluster.