Open hervelemeur opened 3 years ago
No missing external secret:
$ kubectl get externalsecrets -n jx
NAME LAST SYNC STATUS AGE
jenkins-maven-settings 16s SUCCESS 60m
jenkins-release-gpg 2s SUCCESS 60m
jenkins-x-chartmuseum 14s SUCCESS 60m
jx-basic-auth-htpasswd 13s SUCCESS 60m
jx-basic-auth-user-password 1s SUCCESS 60m
lighthouse-hmac-token 11s SUCCESS 60m
lighthouse-oauth-token 6s SUCCESS 60m
nexus 7s SUCCESS 60m
tekton-container-registry-auth 10s SUCCESS 60m
tekton-git 4s SUCCESS 60m
I've then manually deleted this failing pod, all the next ones are "completed" without any problem.
I've got other health checks crashing.
deployment:
time="2021-01-13T21:10:14Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-13T21:10:14Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-01-13T21:10:14Z" level=info msg="Found pod namespace: kuberhealthy"
time="2021-01-13T21:10:14Z" level=info msg="Performing check in kuberhealthy namespace."
time="2021-01-13T21:10:14Z" level=info msg="Parsed CHECK\_DEPLOYMENT\_REPLICAS: 4"
time="2021-01-13T21:10:14Z" level=info msg="Parsed CHECK\_SERVICE\_ACCOUNT: default"
time="2021-01-13T21:10:14Z" level=info msg="Check time limit set to: 14m45.119966409s"
time="2021-01-13T21:10:14Z" level=info msg="Parsed CHECK\_DEPLOYMENT\_ROLLING\_UPDATE: true"
time="2021-01-13T21:10:14Z" level=info msg="Check deployment image will be rolled from \[nginxinc/nginx-unprivileged:1.17.8\] to \[nginxinc/nginx-unprivileged:1.17.9\]"
time="2021-01-13T21:10:14Z" level=info msg="Kubernetes client created."
time="2021-01-13T21:10:14Z" level=info msg="Waiting for node to become ready before starting check."
time="2021-01-13T21:10:15Z" level=error msg="Failed to check node age: nodes \\"aks-default-15766151-vmss000002\\" is forbidden: User \\"system:serviceaccount:kuberhealthy:deployment-sa\\" cannot get resource \\"nodes\\" in API group \\"\\" at the cluster scope"
time="2021-01-13T21:10:15Z" level=info msg="Starting check."
time="2021-01-13T21:10:15Z" level=info msg="Wiping all found orphaned resources belonging to this check."
time="2021-01-13T21:10:15Z" level=info msg="Attempting to find previously created service(s) belonging to this check."
time="2021-01-13T21:10:15Z" level=info msg="Did not find any old service(s) belonging to this check."
time="2021-01-13T21:10:15Z" level=info msg="Attempting to find previously created deployment(s) belonging to this check."
time="2021-01-13T21:10:15Z" level=info msg="Did not find any old deployment(s) belonging to this check."
time="2021-01-13T21:10:15Z" level=info msg="Successfully cleaned up prior check resources."
time="2021-01-13T21:10:15Z" level=info msg="Creating deployment resource with 4 replica(s) in kuberhealthy namespace using image \[nginxinc/nginx-unprivileged:1.17.8\] with environment variables: map\[\]"
time="2021-01-13T21:10:15Z" level=info msg="Creating container using image \[nginxinc/nginx-unprivileged:1.17.8\] with environment variables: map\[\]"
time="2021-01-13T21:10:15Z" level=info msg="Created deployment resource."
time="2021-01-13T21:10:15Z" level=info msg="Creating deployment in cluster with name: deployment-deployment"
time="2021-01-13T21:10:16Z" level=info msg="Watching for deployment to exist."
time="2021-01-13T21:10:31Z" level=info msg="Deployment is reporting Available with True."
time="2021-01-13T21:10:31Z" level=info msg="Created deployment in kuberhealthy namespace: deployment-deployment"
time="2021-01-13T21:10:31Z" level=info msg="Creating service resource for kuberhealthy namespace."
time="2021-01-13T21:10:31Z" level=info msg="Created service resource."
time="2021-01-13T21:10:31Z" level=info msg="Creating service in cluster with name: deployment-svc"
time="2021-01-13T21:10:31Z" level=info msg="Watching for service to exist."
time="2021-01-13T21:10:31Z" level=info msg="Cluster IP found: 10.0.44.239"
time="2021-01-13T21:10:31Z" level=info msg="Created service in kuberhealthy namespace: deployment-svc"
time="2021-01-13T21:10:31Z" level=info msg="Found service cluster IP address: 10.0.44.239"
time="2021-01-13T21:10:31Z" level=info msg="Looking for a response from the endpoint."
time="2021-01-13T21:10:31Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2021-01-13T21:11:01Z" level=info msg="Retrying in 5 seconds."
time="2021-01-13T21:11:06Z" level=info msg="Successfully made an HTTP request on attempt: 2"
time="2021-01-13T21:11:06Z" level=info msg="Got a 200 with a GET to http://10.0.44.239"
time="2021-01-13T21:11:06Z" level=info msg="Got a result from GET request backoff: 200 OK"
time="2021-01-13T21:11:06Z" level=info msg="Successfully hit service endpoint."
time="2021-01-13T21:11:06Z" level=info msg="Rolling update option is enabled. Performing roll."
time="2021-01-13T21:11:06Z" level=info msg="Creating deployment resource with 4 replica(s) in kuberhealthy namespace using image \[nginxinc/nginx-unprivileged:1.17.9\] with environment variables: map\[\]"
time="2021-01-13T21:11:06Z" level=info msg="Creating container using image \[nginxinc/nginx-unprivileged:1.17.9\] with environment variables: map\[\]"
time="2021-01-13T21:11:06Z" level=info msg="Created rolling-update deployment resource."
time="2021-01-13T21:11:06Z" level=info msg="Performing rolling-update on deployment deployment-deployment to \[nginxinc/nginx-unprivileged:1.17.9\]"
time="2021-01-13T21:11:06Z" level=info msg="Rolled deployment in kuberhealthy namespace: deployment-deployment"
time="2021-01-13T21:11:06Z" level=info msg="Looking for a response from the endpoint."
time="2021-01-13T21:11:06Z" level=info msg="Beginning backoff loop for HTTP GET request."
time="2021-01-13T21:11:06Z" level=info msg="Successfully made an HTTP request on attempt: 1"
time="2021-01-13T21:11:06Z" level=info msg="Got a 200 with a GET to http://10.0.44.239"
time="2021-01-13T21:11:06Z" level=info msg="Got a result from GET request backoff: 200 OK"
time="2021-01-13T21:11:06Z" level=info msg="Successfully hit service endpoint after rolling-update."
time="2021-01-13T21:11:06Z" level=info msg="Cleaning up deployment and service."
time="2021-01-13T21:11:06Z" level=info msg="Attempting to delete service deployment-svc in kuberhealthy namespace."
time="2021-01-13T21:11:11Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2021-01-13T21:11:16Z" level=info msg="Attempting to delete deployment in kuberhealthy namespace."
time="2021-01-13T21:11:21Z" level=info msg="Finished clean up process."
time="2021-01-13T21:11:21Z" level=info msg="Reporting success to Kuberhealthy."
time="2021-01-13T21:12:40Z" level=info msg="Recovered panic: runtime error: invalid memory address or nil pointer dereference"
panic: runtime error: invalid memory address or nil pointer dereference \[recovered\]
panic: interface conversion: interface {} is runtime.errorString, not string
\[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11ea333\]
goroutine 1 \[running\]:
main.main.func1(0xc0001f9f00)
/build/cmd/deployment-check/main.go:189 +0x175
panic(0x130e480, 0x2000ab0)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.sendReport(0x2039aa8, 0x0, 0x0, 0x1, 0x0, 0xc0001f9638)
/build/pkg/checks/external/checkclient/main.go:99 +0x4e3
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.ReportSuccess(0xc0001f9658, 0x46983c)
/build/pkg/checks/external/checkclient/main.go:44 +0x7e
main.reportToKuberhealthy(0xc000113101, 0x2039aa8, 0x0, 0x0)
/build/cmd/deployment-check/main.go:260 +0x33
main.reportOKToKuberhealthy()
/build/cmd/deployment-check/main.go:253 +0x92
main.runDeploymentCheck(0x166ed60, 0xc00031d4a0)
/build/cmd/deployment-check/run\_check.go:243 +0x149d
main.main()
/build/cmd/deployment-check/main.go:194 +0x36e
stream closed
dns-status-internal:
time="2021-01-13T21:10:11Z" level=info msg="Found instance namespace: kuberhealthy"
time="2021-01-13T21:10:11Z" level=info msg="Kuberhealthy is located in the kuberhealthy namespace."
time="2021-01-13T21:10:11Z" level=info msg="Check time limit set to: 14m47.427936725s"
time="2021-01-13T21:10:11Z" level=info msg="Check pod is running on node: aks-default-15766151-vmss000000"
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for node to reach minimum age: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=debug msg="Checking if the kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready."
time="2021-01-13T21:10:11Z" level=debug msg="Kuberhealthy endpoint: http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus is ready. Proceeding to run check."
time="2021-01-13T21:10:11Z" level=debug msg="Getting pod: dns-status-internal-1610572204 in order to get its node information"
time="2021-01-13T21:10:11Z" level=error msg="Error waiting for kube proxy to be ready: error getting kuberhealthy pod: pods \"dns-status-internal-1610572204\" is forbidden: User \"system:serviceaccount:kuberhealthy:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kuberhealthy\""
time="2021-01-13T21:10:11Z" level=info msg="Running DNS status checker"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check testing hostname: kubernetes.default"
time="2021-01-13T21:10:11Z" level=info msg="DNS Status check determined that kubernetes.default was OK."
2021/01/13 21:10:11 checkClient: DEBUG: Reporting SUCCESS
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with error length of:0
2021/01/13 21:10:11 checkClient: DEBUG: Sending report with ok state of:true
2021/01/13 21:10:11 checkClient: INFO: Using kuberhealthy reporting URL:http://kuberhealthy.kuberhealthy.svc.cluster.local/externalCheckStatus
2021/01/13 21:10:11 checkClient: DEBUG: Making POST request to kuberhealthy:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x11f1473]
goroutine 1 [running]:
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.sendReport(0x202a688, 0x0, 0x0, 0x1, 0xc0003265e8, 0xc0002edd74)
/build/pkg/checks/external/checkclient/main.go:99 +0x4e3
github.com/Comcast/kuberhealthy/v2/pkg/checks/external/checkclient.ReportSuccess(0xc0002eddf0, 0xc0003265a0)
/build/pkg/checks/external/checkclient/main.go:44 +0x7e
main.reportKHSuccess(0xc0002eddc8, 0xc0002edd70)
/build/cmd/dns-resolution-check/main.go:182 +0x2d
main.(*Checker).Run(0xc0002f5ee0, 0xc00026fe40, 0xc0002eded0, 0x2)
/build/cmd/dns-resolution-check/main.go:161 +0x204
main.main()
/build/cmd/dns-resolution-check/main.go:119 +0x3ce
stream closed
On a new AKS cluster, I've got this error:
(commit ref for the cluster: jx3-gitops-repositories/jx3-terraform-azure@8688dcb)