Closed troyhebe closed 1 year ago
All these pods should be in a running state. If they are not you've bigger issues than just traefik.
"completed" is not a valid status for any of those pods as i'm aware and the pods should be deletable without issues. The safeguard to prevent users from starting TrueCharts Charts, on an unhealthy system seems to be working correctly here.
May I make the suggestion that the error messages on the wait commands be modified to let users know that they should look for and delete any pod in the aforementioned namespaces that are not in the Running state rather than just "metallb-system wait failed...", etc.
This issue is not easy to see right away and at least that way the logs will give users a touch more information to work with.
Afaik scale doesnt even show initcontainer logs…
This issue is locked to prevent necro-posting on closed issues. Please create a new issue or contact staff on discord of the problem persists
App Name
traefik
SCALE Version
22.12.0
App Version
2.9.8_17.0.12
Application Events
Application Logs
Application Configuration
N/A
Describe the bug
The traefik install has a health check that requires ALL pods in cnpg-system, cert-manager, or metallb-system namespaces to be in a running state. The traefik install runs a health check bash script which has 3 kubectl wait commands and can be seen here:
k3s kubectl describe pod/traefik-manifests-hgtvp -n ix-traefik
The issue/bug that I encountered is that the "kubectl wait" will block and eventually timeout causing this test to fail if any pods in the namespace that it is looking for are in the Completed sate.
In my specific case I had a power outage which caused my TrueNAS to shutdown. After the reboot traefik would not start so I decided to simply re-deploy. The re-deploy would ALWAYS fail because cnpg-system and cert-manager had pods in the "Completed" state:
Even though there are cnpq-system pods in the Running state this wait will always block on the Completed pod:
The logical goal of this command seems to be to ensure that A SINGLE cnpg-system pod is ready. However what it is really doing is testing to make sure that EVERY cnpg-system pod's are in a ready state and that seems to be a bug.
To Reproduce
Expected Behavior
N/A
Screenshots
N/A
Additional Context
N/A
I've read and agree with the following