open-policy-agent / kube-mgmt

Sidecar for managing OPA instances in Kubernetes.
Apache License 2.0
235 stars 105 forks source link

Upgrading Issues #242

Closed sba30 closed 8 months ago

sba30 commented 8 months ago

We are doing some upgrades and running into some issues.

The following combination is not working

OPA Version - opa:0.60.0 KubeMgmt -version kube-mgmt:8.5.4 or 8.5.3 or 8.5.2 or 8.5.1

When trying to do a rollout restart of a deployment or daemonset we would get this error

error: failed to patch: Internal error occurred: failed calling webhook "validating-webhook.openpolicyagent.org": failed to call webhook: the server could not find the requested resource

However if we use this everything works. OPA Version - opa:0.60.0 KubeMgmt -version kube-mgmt:8.5.0

eshepelyuk commented 8 months ago
  1. please provide Helm chart values for reproducing.
  2. what exactly does it mean, what exactly you're doing, plz give as much details as possible.

    When trying to do a rollout restart of a deployment or daemonset

sba30 commented 8 months ago

Hi We dont use helm for deploying this.

With kube-mgmt 8.5.1 and above we were doing some initial testing, the test was to do a rollout restart of a deployment or a daemonset, when we did this it fails with the generic error message

error: failed to patch: Internal error occurred: failed calling webhook "validating-webhook.openpolicyagent.org": failed to call webhook: the server could not find the requested resource

On review of the opa logs, in the opa container we see this error

error":{"code":"undefined_document","message":"document missing: data.system.main"}

in the kube-mgmt logs everything looks ok.

○ → kl opa-xxxxxxx kube-mgmt
time="2024-01-11T10:58:42Z" level=info msg="Policy/data ConfigMap processor connected to K8s: namespaces=[]"
time="2024-01-11T10:58:42Z" level=info msg="Initial informer sync for v1/namespaces completed, took 120.267923ms"
time="2024-01-11T10:58:42Z" level=info msg="Initial informer sync for traefik.containo.us/v1alpha1/ingressroutes completed, took 120.317211ms"
time="2024-01-11T10:58:42Z" level=info msg="Syncing traefik.containo.us/v1alpha1/ingressroutes."
time="2024-01-11T10:58:42Z" level=info msg="Initial informer sync for networking.k8s.io/v1/networkpolicies completed, took 120.47374ms"
time="2024-01-11T10:58:42Z" level=info msg="Initial informer sync for networking.k8s.io/v1/ingresses completed, took 120.236117ms"
time="2024-01-11T10:58:42Z" level=info msg="Syncing networking.k8s.io/v1/ingresses."
time="2024-01-11T10:58:42Z" level=info msg="Syncing networking.k8s.io/v1/networkpolicies."
time="2024-01-11T10:58:42Z" level=info msg="Syncing v1/namespaces."
time="2024-01-11T10:58:42Z" level=info msg="Loaded 0 resources of kind networking.k8s.io/v1/ingresses into OPA. Took 9.081604ms"
time="2024-01-11T10:58:42Z" level=info msg="Loaded 16 resources of kind traefik.containo.us/v1alpha1/ingressroutes into OPA. Took 9.228135ms"
time="2024-01-11T10:58:42Z" level=info msg="Loaded 39 resources of kind v1/namespaces into OPA. Took 12.220325ms"
time="2024-01-11T10:58:42Z" level=info msg="Loaded 225 resources of kind networking.k8s.io/v1/networkpolicies into OPA. Took 26.521374ms"

If we switch to use the image kube-mgmt:8.5.0 everything is working as expected

eshepelyuk commented 8 months ago

Hello

Unfortunately, I would be able to provide some help if you used kube-mgtm Helm chart, so the issue can be easy reproducible. From OPA logs I see no errors os any suspicious things.

Although, I still don't understand what you're doing with rollout restart of deployments/deamonset and how it's related to OPA :( So maybe I miss the thing.

sba30 commented 8 months ago

The rollout restart is for ensuring the config maps which we use for our OPA rules get loaded into kube-management

eshepelyuk commented 8 months ago

The rollout restart is for ensuring the config maps which we use for our OPA rules get loaded into kube-management

How exactly rollout restart should help ConfigMap loading to kube-mgmt ? kube-mgmt watches ConfigMap itself and loads their content into internal OPA container when ConfigMap changes.

sba30 commented 8 months ago

Wondering is this the change, which may be causing the issue?

https://github.com/open-policy-agent/kube-mgmt/commit/90e640b3185ec35e0d956f5b314c10871008e60b

our config maps are loaded into opa namespace, and this now removes the default opa to an empty string

sba30 commented 8 months ago

Yup this difference was the reason for the failure, we have now been able to resolve this.

Do you suggest for support reasons we switch to helm?

eshepelyuk commented 8 months ago

Yup this difference was the reason for the failure, we have now been able to resolve this.

Do you suggest for support reasons we switch to helm?

This project is community driven it has no official support as far as I know. And I am currenly only one active maintainer, so I personally will only be able to help when kube-mgmt chat is used. I just don't have capacity to investigate custom installations.

That's it :)