Closed 09cicada closed 2 months ago
Hi, could you please gather Events
section from kubectl describe
command for your automation job pod, when the issue occurred?
kubectl -n awx describe pod automation-job-3583-shdjp
Maybe you can get some events at the bottom of the output like this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned awx/automation-job-1-vc4c4 to kuro-c9s01.krkb.lab
Normal Pulling 37s kubelet Pulling image "quay.io/ansible/awx-ee:latest"
Normal Pulled 8s kubelet Successfully pulled image "quay.io/ansible/awx-ee:latest" in 29.325s (29.326s including waiting)
Normal Created 8s kubelet Created container worker
Normal Started 8s kubelet Started container worker
I suspect that your pod takes much time to pulling image from container registry. In above example, pulling image takes 29 seconds (see the Message
column for the line Pulled
, or calculate difference between the Age
for Pulling
(37s) and Pulled
(8s)).
You can see which events takes much time until Created
event is recorded.
If the pulling image is taking a long time, there is not much that can be done.
Missing
for your EE for your Job Template/var/lib/rancher
to reduce the removal cached images by garbage collection by kubelet (see this official docs)Hello Mr. Kurokobo, spot on, it was indeed the pulling of awx-ee:latest Normal Pulling 2m7s kubelet Pulling image "quay.io/ansible/awx-ee:latest"
I am going to change the pull policy to Missing. I will also add some space to /var/lib/rancher I really appreciate the advice and help. I will close this and thank you once again.
Environment
K3S version: v1.25.4+k3s1 (0dc63334)
Description
Hello Mr Kurokobo. Question on how to troubleshoot performance when starting a job/template. When I first deployed AWX, when synchronizing projects and running templates, the process ran relatively quickly in comparison to recently. When a template is run, a pod/automation-job container is created. I notice that this takes 2 to 4 minutes at times before any output occurs.
When this happens, I run a kubectl get all -n awx and I see the container in the ContainerCreating state for long periods. For example pod/automation-job-3583-shdjp 0/1 ContainerCreating
Do you know how I can troubleshoot this specific issue? I looked at your troubleshooting guide but I did not see anything specific to this issue. If I missed that I apologize ahead of time.
Thank you