logicmonitor / k8s-argus

Automated Kubernetes monitoring.
https://logicmonitor.github.io/k8s-argus/
Mozilla Public License 2.0
36 stars 15 forks source link

TRANSIENT_FAILURE errors on Argus start #64

Open ddeasey opened 6 years ago

ddeasey commented 6 years ago

Argus pod in an 'Error' state after helm deploy. 'TRANSIENT_FAILURE' message in the logs.

$ kubectl get pod --all-namespaces
NAMESPACE     NAME                                       READY     STATUS             RESTARTS   AGE
kube-system   argus-4248942165-9vgdx                     0/1       CrashLoopBackOff   7          15m
kube-system   collectorset-controller-4255353268-kwbgh   1/1       Running            0          15m
kube-system   kube-dns-2834558388-3h7m6                  3/3       Running            0          43d
kube-system   tiller-deploy-3341511835-rsksq             1/1       Running            0          5d

argus pod logs:

$ kubectl logs argus-4248942165-9vgdx -n kube-system
time="2017-11-15T21:05:00Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:01Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:01Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:02Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:02Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:05Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:05Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:08Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:08Z" level=info msg="Waiting for gRPC"
time="2017-11-15T21:05:10Z" level=fatal msg="Failed waiting for gRPC to ready, state is \"TRANSIENT_FAILURE\""

no container logs visible

$ kubectl logs collectorset-controller-4255353268-kwbgh -n kube-system
$

Helm deploy command:

helm upgrade --debug --install --version 0.2.0 --namespace kube-system \
                            --set collectorset-controller.imageTag=0.1.0-alpha.0 \
                            --set global.accessID=‘REDACTED’ \
                            --set global.accessKey='REDACTED' \
                            --set global.account='REDACTED' \
                            --set enableRBAC=false  \
                            --set collectorset-controller.enableRBAC=false \
                            --set clusterName='REDACTED' \
                            --set imageTag=0.2.0-alpha.0 \
                            argus logicmonitor/argus
andrewrynhard commented 6 years ago

@ddeasey Looks like a network error caused the gRPC connection to fail. However, I do think a better error message could be implemented. Going to leave open for now.