kurokobo / awx-on-k3s

An example implementation of AWX on single node K3s using AWX Operator, with easy-to-use simplified configuration with ownership of data and passwords.
MIT License
518 stars 143 forks source link

AWX process slow to start when running a template #358

Closed 09cicada closed 2 months ago

09cicada commented 2 months ago

Environment

K3S version: v1.25.4+k3s1 (0dc63334)

Description

Hello Mr Kurokobo. Question on how to troubleshoot performance when starting a job/template. When I first deployed AWX, when synchronizing projects and running templates, the process ran relatively quickly in comparison to recently. When a template is run, a pod/automation-job container is created. I notice that this takes 2 to 4 minutes at times before any output occurs.

When this happens, I run a kubectl get all -n awx and I see the container in the ContainerCreating state for long periods. For example pod/automation-job-3583-shdjp 0/1 ContainerCreating

Do you know how I can troubleshoot this specific issue? I looked at your troubleshooting guide but I did not see anything specific to this issue. If I missed that I apologize ahead of time.

Thank you

kurokobo commented 2 months ago

Hi, could you please gather Events section from kubectl describe command for your automation job pod, when the issue occurred?

kubectl -n awx describe pod automation-job-3583-shdjp

Maybe you can get some events at the bottom of the output like this:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  37s   default-scheduler  Successfully assigned awx/automation-job-1-vc4c4 to kuro-c9s01.krkb.lab
  Normal  Pulling    37s   kubelet            Pulling image "quay.io/ansible/awx-ee:latest"
  Normal  Pulled     8s    kubelet            Successfully pulled image "quay.io/ansible/awx-ee:latest" in 29.325s (29.326s including waiting)
  Normal  Created    8s    kubelet            Created container worker
  Normal  Started    8s    kubelet            Started container worker

I suspect that your pod takes much time to pulling image from container registry. In above example, pulling image takes 29 seconds (see the Message column for the line Pulled, or calculate difference between the Age for Pulling(37s) and Pulled(8s)). You can see which events takes much time until Created event is recorded.

If the pulling image is taking a long time, there is not much that can be done.

09cicada commented 2 months ago

Hello Mr. Kurokobo, spot on, it was indeed the pulling of awx-ee:latest Normal Pulling 2m7s kubelet Pulling image "quay.io/ansible/awx-ee:latest"

I am going to change the pull policy to Missing. I will also add some space to /var/lib/rancher I really appreciate the advice and help. I will close this and thank you once again.