open-policy-agent / kube-mgmt

Sidecar for managing OPA instances in Kubernetes.
Apache License 2.0
239 stars 106 forks source link

Add liveness probe to kube-mgmt container #211

Open eshepelyuk opened 1 year ago

eshepelyuk commented 1 year ago
  1. On startup kube-mgmt should add sample policy to OPA container using OPA REST API. The policy i a marker that communication between containers is established and kube-mgmt started reconciliation.
  2. The sample policy should be implemented as Custom Health Check
  3. Then liveness probe should be added to kube-mgmt container, that will periodically check that OPA policy against OPA container. If policy is missing - most probably OPA container was restarted, so kube-mgmt pod can be killed and on the restart policy will be synchronized.
  4. Thresholds and periods should be set to values that would enforce kube-mgmt container restart as soon as possible.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes

Relates #189 Relates #206

saranyareddy24 commented 1 year ago

We faced this issue where OPA container restarts and kube-mgmt container is not aware of it, so it doesn't load the policies.

Solution that worked for us:

When policies were not properly loaded to OPA, the http post request sent on OPA pod will have the below response Request URL: https://127.0.0.1:8443 Method: POST Response code: 404 Response:

{
  "code": "undefined_document",
  "message": "document missing: data.system.main"
}

But when OPA policies were loaded properly the same post request will be successful with the below response Request URL: https://127.0.0.1:8443 Method: POST Response code: 200 Response: {"apiVersion":"admission.k8s.io/v1beta1","kind":"AdmissionReview","response":{"allowed":true}} Below configuration of liveness probe works fine, it keeps checking whether the response code for the HTTPS request is 200, if not it will restart the container, there by loading policies again.

livenessProbe:
    exec:
        command:
            - sh
            - -c
            - rc=`wget --server-response https://127.0.0.1:8443 --post-data {} --no-check-certificate
              2>&1 | awk '/^  HTTP/{print $2}'`;[ $rc -eq 200 ]
      failureThreshold: 1
      initialDelaySeconds: 60
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 30

Let me know if this is a good approach. If the solution is fine, I can contribute and check in this change.

eshepelyuk commented 1 year ago

Hello @saranyareddy24 the approach is described in head of the issue. your approach is a partial case depending on your current helm chart setup, it is not covering all possible setup options.

saranyareddy24 commented 1 year ago

Let me know if this is fine.

Configmap which creates start.rego

apiVersion: v1
kind: ConfigMap
metadata:
  name: policy-start
  labels:
    openpolicyagent.org/policy: rego
data:
  start.rego: |
    # If kube-mgmt is not able to access this policy it will consider
    # that OPA has restarted and it will try to reload the policies by restarting.
    package test
    description := "Policy that loads on start of OPA"

Liveness check for fetching start.rego

  livenessProbe:
    failureThreshold: 5
    httpGet:
      path: /v1/policies/default/policy-start/start.rego
      port: 8181
      scheme: HTTPS
    initialDelaySeconds: 60
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 10

Tested on my local, the configuration works.

eshepelyuk commented 1 year ago

Hello @saranyareddy24

I do not understand the purpose of presented ConfigMap. Please describe how it's gonna work.

eshepelyuk commented 1 year ago

Hello @saranyareddy24

I do not understand the purpose of presented ConfigMap. Please describe how it's gonna work.

Hello @saranyareddy24

I've also updated issue description. Hope, the intention will be more clear.