Adding startup probe for kube-mgmt container

saranyareddy24 commented 1 year ago

Fixes #210

Fix provided:

Before starting kube-mgmt container, the startup probe will check the health of OPA container using the API <HTTP-SCHEME>://127.0.0.1:<OPA-PORT>/health

kubectl describe pod:

 Startup:        http-get https://:8181/health delay=20s timeout=1s period=10s #success=1 #failure=5

Pod came up successfully with the change

kc get pods
NAME                                 READY   STATUS    RESTARTS   AGE
opa-opa-kube-mgmt-66ff988f55-5wdjv   2/2     Running   0          32s

eshepelyuk commented 1 year ago

@saranyareddy24

please use this conventions to link a PR to an issue

https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

saranyareddy24 commented 1 year ago

I am not able to link the PR to the issue now, doubting if I don't have permissions for it.

eshepelyuk commented 1 year ago

I am not able to link the PR to the issue now, doubting if I don't have permissions for it.

just read carefully the link I've provided.

eshepelyuk commented 1 year ago

Linter test for helm

./test/linter/test.sh ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed engine.go:189: [INFO] Fail: cert-manager CRD does not appear to be installed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed ==> Linting charts/opa-kube-mgmt

1 chart(s) linted, 0 chart(s) failed

================================================================================== LINT PASSED

why did you post it here ??

saranyareddy24 commented 1 year ago

For configuration flexibility - this startup probe should be completely picked up from values, i.e. be configurable.

linter tests should be added since new value(s) are added.

new e2e test should be provided, to check kube mgmt is not starting in case opa is down.

Existing helm linter tests will cover the new values as well. For E2E test case: Approach I have in mind is patching the command in the deployment of OPA, so that OPA does not start, and grep for startup probe error in the description of the pod.

@eshepelyuk Please suggest if there are any other better approaches.

saranyareddy24 commented 1 year ago

@eshepelyuk Is there a way to get more logs on the build failure to identify why the build is failing?

eshepelyuk commented 1 year ago

@eshepelyuk Is there a way to get more logs on the build failure to identify why the build is failing?

I don't know TBH. But is it passing OK locally for you ?

anderseknert commented 1 year ago

@saranyareddy24 logs are over here: https://github.com/open-policy-agent/kube-mgmt/actions/runs/5253679992/jobs/9491313262

This looks relevant:

Starting deploy...
Helm release kube-mgmt not installed. Installing...
Error: INSTALLATION FAILED: context deadline exceeded
deploying "kube-mgmt": install: exit status 1
error: Recipe `up` failed on line 63 with exit code 1

saranyareddy24 commented 1 year ago

@saranyareddy24 logs are over here: https://github.com/open-policy-agent/kube-mgmt/actions/runs/5253679992/jobs/9491313262

This looks relevant:
Starting deploy...
Helm release kube-mgmt not installed. Installing...
Error: INSTALLATION FAILED: context deadline exceeded
deploying "kube-mgmt": install: exit status 1
error: Recipe `up` failed on line 63 with exit code 1

I went through this, I wanted to see kubectl describe pod to see why it is crashing, but I noticed that those lines were commented in build file.

saranyareddy24 commented 1 year ago

@eshepelyuk Is there a way to get more logs on the build failure to identify why the build is failing?

I don't know TBH. But is it passing OK locally for you ?

I ran it locally and found that, it is failing at startup probe.

Startup probe failed: Get "https://10.244.0.36:8181/health": http: server gave HTTP response to HTTPS client.

Will fix it.

open-policy-agent / kube-mgmt

Adding startup probe for kube-mgmt container #215

================================================================================== LINT PASSED