Closed porwalameet closed 1 year ago
Hey @porwalameet, would you also be able to share the logs of the OPA container? Is it crashing on start up or showing any errors which might explain why it hasn't started?
The OPA pod is up and running and kube-mgmt reports policy loaded for after initial attempt which was rejected because of "connection refused" on host: localhost:8181
I am attaching the poa logs, i have removed admission request for sake of brevity. opa-logs-github.txt the decision logs keep repeating.
This does not look like an issue with OPA itself. @porwalameet you should look into Bundles for loading policies into OPA. It's a more robust and production-tested approach. You can refer to this tutorial for an example. Also with the config map approach, you need to make sure the openpolicyagent.org/policy-status
annotation is set to {"status": "ok"}
before making any requests. Anyways please use the bundle approach for loading policies into OPA. You can even set a readinessProbe
to ensure that policies are loaded before OPA handles any requests. Also the bundle approach is much easier to monitor using the Status API.
Thanks @ashutosh-narkar . Why I suspect OPA issue here is - in the kubernetes environment, the 3 replicas of OPA deployment only one is in error state. The config map which holds the rego policy reports status as{"status": "ok"}
since the other 2 pod where issue is not seen have loaded.
infact on problematic mode, kube-mgmt reports policies are put successfuly if you see the kube-mgmt logs. The problem is when OPA pod gets admission request.
We dont want to use bundle approach at we want all resources confined in kubernetes only and no interaction with external server.
@porwalameet as seen in your logs, some policies are not loaded correctly. The PUT
calls that kube-mgmt makes to OPA seem to be failing in some instances. I don't think kube-mgmt has any retry mechanism. So it's possible that not all policies are loaded correctly which results in the error you see. Hence I mentioned that the issue seems to be more related to the deployment/environment than OPA itself.
Encountered this exact issue, the node where the OPA pod lives gets bounced , then the pod is restarted and in a running state but lots of error":{"code":"undefined_document","message":"document missing: data.system.main"}, errors.
Having to delete the pod in order to resolve the issue.
kube-mgmt-logs-failed.txt
Short description
OPA pod complains about "undefined document" for a policy when the node where OPA pod is running is rebooted.
When API hitting the problematic pod:
When API hitting the working pod:
Environment: OPA version: 0.50.2 kube-mgmt version: 8.0.2 Kubernetes: 1.24.8 OPA replicas: 3 kube-mgmt is running as side-car to OPA container
Steps to Reproduce:
the other 2 pods, there was no such connection refused error seen.
Even after policy being successfully loaded, the command
curl -k -X post https://<pod-ip>
returns undefined document, where as other 2 pods gives 200 OK.Expected Output: the policy should be loaded and admission request should be evaluated accordingly.[](url))