open-policy-agent / gatekeeper

🐊 Gatekeeper - Policy Controller for Kubernetes
https://open-policy-agent.github.io/gatekeeper/
Apache License 2.0
3.71k stars 764 forks source link

No violations found with gatekeeper 3.8.0 although there are violations and they were found with 3.7.2. #2026

Closed mbrowatzki closed 2 years ago

mbrowatzki commented 2 years ago

What steps did you take and what happened: With gatekeeper 3.7.2 there are Total Violations: 4 found in my ns-must-have-label constraint. It's similair to the all_ns_must_have_gatekeeper.yaml example under the demo examples... After update to 3.8.0 there are: Total Violations: 0.

I played a litte bit around. Fresh install with 3.8.0 without config (no excluded Namespaces) shows Total Violations: 8 After deploying the following config there are Total Violations: 0 again.

`apiVersion: config.gatekeeper.sh/v1alpha1 kind: Config metadata: name: config namespace: '{{ .Release.Namespace }}' spec: match:

What did you expect to happen: Same amount of violations in 3.7.2 and 3.8.0

Anything else you would like to add: Same behavior with 3.9.0-beta.0.

Environment:

mrjoelkamp commented 2 years ago

I am also experiencing the same issue upgrading helm chart from 3.7.2 to 3.8.0. Gatekeeper no longer warns on policy failure. I also have the same namespace exemption config.

Before upgrade:

kubectl apply -f pod.yaml -n test
Warning: [psp-readonlyrootfilesystem] only read-only root filesystem container is allowed: pause
Warning: [psp-pods-allowed-user-ranges] Container pause is attempting to run without a required securityContext/runAsNonRoot or securityContext/runAsUser != 0
Warning: [psp-pods-allowed-user-ranges] Container pause is attempting to run without a required securityContext/runAsGroup. Allowed runAsGroup: {"ranges": [{"max": 65535, "min": 1}], "rule": "MustRunAs"}
pod/gatekeeper-test-pod created

After upgrade:

kubectl apply -f pod.yaml -n test
pod/gatekeeper-test-pod created

I also tried changing the policy to deny, same behavior after upgrade. Pod gets created as if the constraint doesn't exist.

After removing the Config Gatekeeper runs as expected.

Same behavior in v3.8.0-rc1 and v3.8.0-rc2. Did some digging and it seems that the constraints aren't running for the request.

maxsmythe commented 2 years ago

I created a similar config resource in a kind cluster and it appears to function properly:

$ kubectl get config -n gatekeeper-system -o yaml
apiVersion: v1
items:
- apiVersion: config.gatekeeper.sh/v1alpha1
  kind: Config
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"config.gatekeeper.sh/v1alpha1","kind":"Config","metadata":{"annotations":{},"name":"config","namespace":"gatekeeper-system"},"spec":{"sync":{"syncOnly":[{"group":"","kind":"Service","version":"v1"},{"group":"","kind":"Pod","version":"v1"},{"group":"extensions","kind":"Ingress","version":"v1beta1"},{"group":"networking.k8s.io","kind":"Ingress","version":"v1"},{"group":"","kind":"Namespace","version":"v1"}]}}}
    creationTimestamp: "2022-04-30T04:11:40Z"
    generation: 2
    name: config
    namespace: gatekeeper-system
    resourceVersion: "595193"
    uid: e13b6cd8-5f91-4415-a8ad-9e9866dadb28
  spec:
    match:
    - excludedNamespaces:
      - kube-*
      - gatekeeper-system
      processes:
      - '*'
    sync:
      syncOnly:
      - group: ""
        kind: Service
        version: v1
      - group: ""
        kind: Pod
        version: v1
      - group: extensions
        kind: Ingress
        version: v1beta1
      - group: networking.k8s.io
        kind: Ingress
        version: v1
      - group: ""
        kind: Namespace
        version: v1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

gatekeeper/demo/agilebank$ kubectl apply -f bad_resources/
service/gatekeeper-test-service unchanged
Warning: resource namespaces/production is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
Error from server (Forbidden): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"labels\":{\"owner\":\"me\"},\"name\":\"production\"}}\n"},"labels":{"owner":"me"}}}
to:
Resource: "/v1, Resource=namespaces", GroupVersionKind: "/v1, Kind=Namespace"
Name: "production", Namespace: ""
for: "bad_resources/namespace.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [all-must-have-owner] All namespaces must have an `owner` label that points to your company username
Error from server (Forbidden): error when creating "bad_resources/opa_limits_too_high.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [must-have-probes] Container <opa> in your <Pod> <opa> has no <livenessProbe>
[must-have-probes] Container <opa> in your <Pod> <opa> has no <readinessProbe>
[container-must-have-limits] container <opa> cpu limit <300m> is higher than the maximum allowed of <200m>
[container-must-have-limits] container <opa> memory limit <4000Mi> is higher than the maximum allowed of <1Gi>
Error from server (Forbidden): error when creating "bad_resources/opa_no_limits.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [must-have-probes] Container <opa> in your <Pod> <opa> has no <livenessProbe>
[must-have-probes] Container <opa> in your <Pod> <opa> has no <readinessProbe>
[container-must-have-limits] container <opa> has no resource limits
Error from server (Forbidden): error when creating "bad_resources/opa_wrong_repo.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [must-have-probes] Container <opa> in your <Pod> <opa> has no <livenessProbe>
[must-have-probes] Container <opa> in your <Pod> <opa> has no <readinessProbe>
[prod-repo-is-openpolicyagent] container <opa> has an invalid image repo <gcr.io/smythe-kpc/testbuilds/opa:0.9.2>, allowed repos are ["openpolicyagent"]

I wonder if this is a case of the webhook failing open?

@mrjoelkamp When you say "Did some digging and it seems that the constraints aren't running for the request." what did you notice?

What are:

@mbrowatzki It looks like you are observing a lack of expected audit results

maxsmythe commented 2 years ago

Also @mbrowatzki is it just a namespace label constraint, or do other constraints break?

maxsmythe commented 2 years ago

Audit also appears to be working for me:

$ kubectl get k8srequiredlabels.constraints.gatekeeper.sh all-must-have-owner -oyaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"constraints.gatekeeper.sh/v1beta1","kind":"K8sRequiredLabels","metadata":{"annotations":{},"name":"all-must-have-owner"},"spec":{"match":{"kinds":[{"apiGroups":[""],"kinds":["Namespace"]}]},"parameters":{"labels":[{"allowedRegex":"^[a-zA-Z]+.agilebank.demo$","key":"owner"}],"message":"All namespaces must have an `owner` label that points to your company username"}}}
  creationTimestamp: "2022-04-30T04:16:07Z"
  generation: 1
  name: all-must-have-owner
  resourceVersion: "597207"
  uid: 1da4cb30-92c0-4ff0-8356-ef6f92e9034f
spec:
  match:
    kinds:
    - apiGroups:
      - ""
      kinds:
      - Namespace
  parameters:
    labels:
    - allowedRegex: ^[a-zA-Z]+.agilebank.demo$
      key: owner
    message: All namespaces must have an `owner` label that points to your company
      username
status:
  auditTimestamp: "2022-04-30T04:28:29Z"
  byPod:
  - constraintUID: 1da4cb30-92c0-4ff0-8356-ef6f92e9034f
    enforced: true
    id: gatekeeper-audit-5664d4768b-2mrjm
    observedGeneration: 1
    operations:
    - audit
    - mutation-status
    - status
  - constraintUID: 1da4cb30-92c0-4ff0-8356-ef6f92e9034f
    enforced: true
    id: gatekeeper-controller-manager-86c55bf59d-bjccw
    observedGeneration: 1
    operations:
    - mutation-webhook
    - webhook
  - constraintUID: 1da4cb30-92c0-4ff0-8356-ef6f92e9034f
    enforced: true
    id: gatekeeper-controller-manager-86c55bf59d-sf9pj
    observedGeneration: 1
    operations:
    - mutation-webhook
    - webhook
  - constraintUID: 1da4cb30-92c0-4ff0-8356-ef6f92e9034f
    enforced: true
    id: gatekeeper-controller-manager-86c55bf59d-v5ctf
    observedGeneration: 1
    operations:
    - mutation-webhook
    - webhook
  totalViolations: 3
  violations:
  - enforcementAction: deny
    kind: Namespace
    message: All namespaces must have an `owner` label that points to your company
      username
    name: production
  - enforcementAction: deny
    kind: Namespace
    message: All namespaces must have an `owner` label that points to your company
      username
    name: local-path-storage
  - enforcementAction: deny
    kind: Namespace
    message: All namespaces must have an `owner` label that points to your company
      username
    name: default
mbrowatzki commented 2 years ago

@mbrowatzki It looks like you are observing a lack of expected audit results

What are the contents of your gatekeeper config if you run kubectl get -n gatekeeper-system config -oyaml ? (please preserve whitespace as it might be important) `% kubectl get -n gatekeeper-system config -oyaml apiVersion: v1 items:

Do you see any crashing on your pods in gatekeeper-system? no.

mrjoelkamp commented 2 years ago

@maxsmythe thanks for looking into this

@mrjoelkamp When you say "Did some digging and it seems that the constraints aren't running for the request." what did you notice?

I checked the gatekeeper-controller-manager pod logs. I expected to see something related to admission like the following:

{"level":"info","ts":1651591872.707681,"logger":"webhook","msg":"denied admission","process":"admission","event_type":"violation","constraint_name":"psp-readonlyrootfilesystem","constraint_group":"constraints.gatekeeper.sh","constraint_api_version":"v1beta1","constraint_kind":"K8sPSPReadOnlyRootFilesystem","constraint_action":"warn","resource_group":"","resource_api_version":"v1","resource_kind":"Pod","resource_namespace":"test","resource_name":"gatekeeper-test-pod","request_username":"admin"}

What are: The contents of the status field for a constraint you would expect to be enforced?

Here is the status of one of the many psp constraints we have active:

kubectl describe k8spspallowedusers psp-pods-allowed-user-ranges
...
Status:
  Audit Timestamp:  2022-05-03T13:14:47Z
  By Pod:
    Constraint UID:       fd5f54e1-0478-493b-b95a-f149753a009c
    Enforced:             true
    Id:                   gatekeeper-audit-c87644fc-4ztz8
    Observed Generation:  1
    Operations:
      audit
      mutation-status
      status
    Constraint UID:       fd5f54e1-0478-493b-b95a-f149753a009c
    Enforced:             true
    Id:                   gatekeeper-controller-manager-67bcbc469d-pkhrc
    Observed Generation:  1
    Operations:
      mutation-webhook
      webhook
    Constraint UID:       fd5f54e1-0478-493b-b95a-f149753a009c
    Enforced:             true
    Id:                   gatekeeper-controller-manager-67bcbc469d-qgbp8
    Observed Generation:  1
    Operations:
      mutation-webhook
      webhook

The status of the pods inside the gatekeeper-system namespace

Pod status (I am using a custom namespace security instead of gatekeeper-system):

kubectl get po -n security
NAME                                             READY   STATUS    RESTARTS   AGE
gatekeeper-audit-c87644fc-4ztz8                  1/1     Running   0          11m
gatekeeper-controller-manager-67bcbc469d-pkhrc   1/1     Running   0          10m
gatekeeper-controller-manager-67bcbc469d-qgbp8   1/1     Running   0          10m

Any weird log entries? Particularly ones that don't go away?

No abnormal log entries

If you look at the logs for your K8s api server, does it show it reaching out to the Gatekeeper webhook? Is there a response?

I am using AWS EKS, I enabled control plan API server logs and there are only a few gatekeeper related logs. I am not seeing any webhook related entries.

If there is a resource that has been created successfully, are you seeing any audit results on a constraint that should have denied it?

No audit violations for resources that were admitted with the excludedNamespaces Config in place. I get tons of audit violations for existing resources once I remove the excludedNamespaces Config and restart the gatekeeper pods. It seems like they don't actually recover until removing the Config and restarting the pods.

I also tested with the gatekeeper webhook configurations to failurePolicy: Fail to see if I would be denied if the call to the webhook would timeout or fail for whatever reason. I was still able to create resources with excludedNamespaces Config in place with the webhooks set to Fail.

Here is the output for the Config:

kubectl get -n security config -oyaml
apiVersion: v1
items:
- apiVersion: config.gatekeeper.sh/v1alpha1
  kind: Config
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"config.gatekeeper.sh/v1alpha1","kind":"Config","metadata":{"annotations":{},"name":"config","namespace":"security"},"spec":{"match":[{"excludedNamespaces":["kube-system","flux-system"],"processes":["*"]}]}}
    creationTimestamp: "2022-05-03T14:30:14Z"
    generation: 1
    name: config
    namespace: security
    resourceVersion: "206616811"
    uid: ace1a96f-1406-4fca-977f-f63165d0f076
  spec:
    match:
    - excludedNamespaces:
      - kube-system
      - flux-system
      processes:
      - '*'
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
mrjoelkamp commented 2 years ago

It might be worth pointing out that I started experiencing this bug in the releases after this PR was merged https://github.com/open-policy-agent/gatekeeper/pull/1796

Which reworked some of the regex used to parse excludedNamespaces

And this issue https://github.com/open-policy-agent/gatekeeper/issues/2002 that was opened showing config unit test failures.

maxsmythe commented 2 years ago

Thanks for the data!

This is not good, users who have configs should probably avoid this release until we sort this out.

@open-policy-agent/gatekeeper-maintainers @ritazh @sozercan @willbeason

I was able to replicate the bug with this config (reliably, no flaking):

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"config.gatekeeper.sh/v1alpha1","kind":"Config","metadata":{"annotations":{},"creationTimestamp":"2022-05-03T14:30:14Z","generation":1,"name":"config","namespace":"gatekeeper-system"},"spec":{"match":[{"excludedNamespaces":["kube-system","flux-system"],"processes":["*"]}]}}
  creationTimestamp: "2022-05-04T07:22:10Z"
  generation: 1
  name: config
  namespace: gatekeeper-system
  resourceVersion: "1233649"
  uid: 7fe62f5d-d366-4a9c-bd20-b33b6817ae53
spec:
  match:
  - excludedNamespaces:
    - kube-system
    - flux-system
    processes:
    - '*'

Poking around, it looks like OPAs data cache is somehow getting wiped of constraints even though the constraint framework knows about the constraints. Here is some debug output from a custom build showing the OPA cache being empty but other caches being full:

DEBUG Reviewing sfg 
DEBUG in review
DEBUG attempting to match against container-must-have-limits
DEBUG matching sfg 
DEBUG attempting to match against all-must-have-owner
DEBUG matching sfg 
DEBUG match successful sfg 
DEBUG match result {0xc00011b358  <nil>}
DEBUG attempting to match against unique-service-selector
DEBUG match result {0xc00011a738  <nil>}
DEBUG attempting to match against prod-repo-is-openpolicyagent
DEBUG matching sfg 
DEBUG attempting to match against must-have-probes
DEBUG matching sfg 
DEBUG calling opa
DEBUG opa constraints K8sUniqueServiceSelector
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local26__ = __local0__.kind; __local27__ = __local0__.name; __local28__ = data.constraints[__local26__][__local27__]; __local29__ = input.review; __local1__ = {"parameters": __local28__, "review": __local29__}; __local25__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local25__; object.get(r, "details", {}, __local16__); __local30__ = r.msg; __local2__ = {"details": __local16__, "key": __local0__, "msg": __local30__} }
DEBUG module name template0
DEBUG module contents: package template

make_apiversion(__local3__) = apiVersion { __local4__ = __local3__.group; __local5__ = __local3__.version; neq(__local4__, ""); sprintf("%v/%v", [__local4__, __local5__], __local17__); apiVersion = __local17__ }
make_apiversion(__local6__) = apiVersion { __local6__.group = ""; apiVersion = __local6__.version }
identical(__local7__, __local8__) = true { __local7__.metadata.namespace = __local8__.namespace; __local7__.metadata.name = __local8__.name; __local7__.kind = __local8__.kind.kind; __local31__ = __local8__.kind; data.template.make_apiversion(__local31__, __local18__); __local7__.apiVersion = __local18__ }
flatten_selector(__local9__) = __local11__ { __local10__ = [s | val = __local9__.spec.selector[key]; concat(":", [key, val], __local19__); s = __local19__]; sort(__local10__, __local20__); concat(",", __local20__, __local21__); __local11__ = __local21__ }
violation[{"msg": __local15__}] { input.review.kind.kind = "Service"; input.review.kind.version = "v1"; input.review.kind.group = ""; __local32__ = input.review.object; data.template.flatten_selector(__local32__, __local22__); __local12__ = __local22__; __local13__ = data.inventory.namespace[namespace][_][_][name]; __local33__ = input.review; not data.template.identical(__local13__, __local33__); data.template.flatten_selector(__local13__, __local23__); __local14__ = __local23__; __local12__ = __local14__; sprintf("same selector as service <%v> in namespace <%v>", [name, namespace], __local24__); __local15__ = __local24__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa constraints K8sRequiredLabels
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local27__ = __local0__.kind; __local28__ = __local0__.name; __local29__ = data.constraints[__local27__][__local28__]; __local30__ = input.review; __local1__ = {"parameters": __local29__, "review": __local30__}; __local26__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local26__; object.get(r, "details", {}, __local19__); __local31__ = r.msg; __local2__ = {"details": __local19__, "key": __local0__, "msg": __local31__} }
DEBUG module name template0
DEBUG module contents: package template

get_message(__local3__, __local4__) = __local5__ { not __local3__.message; __local5__ = __local4__ }
get_message(__local6__, __local7__) = __local8__ { __local8__ = __local6__.message }
violation[{"details": {"missing_labels": __local12__}, "msg": __local14__}] { __local9__ = {label | input.review.object.metadata.labels[label]}; __local11__ = {__local10__ | __local10__ = input.parameters.labels[_].key}; minus(__local11__, __local9__, __local20__); __local12__ = __local20__; count(__local12__, __local21__); gt(__local21__, 0); sprintf("you must provide labels: %v", [__local12__], __local22__); __local13__ = __local22__; __local32__ = input.parameters; data.template.get_message(__local32__, __local13__, __local23__); __local14__ = __local23__ }
violation[{"msg": __local18__}] { __local15__ = input.review.object.metadata.labels[key]; __local16__ = input.parameters.labels[_]; __local16__.key = key; __local33__ = __local16__.allowedRegex; neq(__local33__, ""); __local34__ = __local16__.allowedRegex; not re_match(__local34__, __local15__); __local35__ = __local16__.allowedRegex; sprintf("Label <%v: %v> does not satisfy allowed regex: %v", [key, __local15__, __local35__], __local24__); __local17__ = __local24__; __local36__ = input.parameters; data.template.get_message(__local36__, __local17__, __local25__); __local18__ = __local25__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa results []
DEBUG trace Target: admission.k8s.gatekeeper.sh
Trace:
Enter data.hooks.violation[result] = _; _
| Eval data.hooks.violation[result] = _
| Index data.hooks.violation (matched 0 rules)
| Fail data.hooks.violation[result] = _
Enter data.hooks.violation[result] = _; _
| Eval data.hooks.violation[result] = _
| Index data.hooks.violation (matched 0 rules)
| Fail data.hooks.violation[result] = _

The logs show the constraint framework matching against the agilebank demo's all-must-have-owner constraint (which means it's in the golang cache). It also shows that the template source code is loaded. However the dump line should show all constraints, but is instead showing nothing.

Removing these lines of code from the config controller appears to fix the behavior (though we can't actually remove those lines, since they're needed to avoid stale cached data):

DEBUG Reviewing sfg 
DEBUG in review
DEBUG attempting to match against container-must-have-limits
DEBUG matching sfg 
DEBUG attempting to match against all-must-have-owner
DEBUG matching sfg 
DEBUG match successful sfg 
DEBUG match result {0xc00011b358  <nil>}
DEBUG attempting to match against unique-service-selector
DEBUG match result {0xc00011a738  <nil>}
DEBUG attempting to match against prod-repo-is-openpolicyagent
DEBUG matching sfg 
DEBUG attempting to match against must-have-probes
DEBUG matching sfg 
DEBUG calling opa
DEBUG opa constraints K8sUniqueServiceSelector
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local26__ = __local0__.kind; __local27__ = __local0__.name; __local28__ = data.constraints[__local26__][__local27__]; __local29__ = input.review; __local1__ = {"parameters": __local28__, "review": __local29__}; __local25__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local25__; object.get(r, "details", {}, __local16__); __local30__ = r.msg; __local2__ = {"details": __local16__, "key": __local0__, "msg": __local30__} }
DEBUG module name template0
DEBUG module contents: package template

make_apiversion(__local3__) = apiVersion { __local4__ = __local3__.group; __local5__ = __local3__.version; neq(__local4__, ""); sprintf("%v/%v", [__local4__, __local5__], __local17__); apiVersion = __local17__ }
make_apiversion(__local6__) = apiVersion { __local6__.group = ""; apiVersion = __local6__.version }
identical(__local7__, __local8__) = true { __local7__.metadata.namespace = __local8__.namespace; __local7__.metadata.name = __local8__.name; __local7__.kind = __local8__.kind.kind; __local31__ = __local8__.kind; data.template.make_apiversion(__local31__, __local18__); __local7__.apiVersion = __local18__ }
flatten_selector(__local9__) = __local11__ { __local10__ = [s | val = __local9__.spec.selector[key]; concat(":", [key, val], __local19__); s = __local19__]; sort(__local10__, __local20__); concat(",", __local20__, __local21__); __local11__ = __local21__ }
violation[{"msg": __local15__}] { input.review.kind.kind = "Service"; input.review.kind.version = "v1"; input.review.kind.group = ""; __local32__ = input.review.object; data.template.flatten_selector(__local32__, __local22__); __local12__ = __local22__; __local13__ = data.inventory.namespace[namespace][_][_][name]; __local33__ = input.review; not data.template.identical(__local13__, __local33__); data.template.flatten_selector(__local13__, __local23__); __local14__ = __local23__; __local12__ = __local14__; sprintf("same selector as service <%v> in namespace <%v>", [name, namespace], __local24__); __local15__ = __local24__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa constraints K8sRequiredLabels
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local27__ = __local0__.kind; __local28__ = __local0__.name; __local29__ = data.constraints[__local27__][__local28__]; __local30__ = input.review; __local1__ = {"parameters": __local29__, "review": __local30__}; __local26__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local26__; object.get(r, "details", {}, __local19__); __local31__ = r.msg; __local2__ = {"details": __local19__, "key": __local0__, "msg": __local31__} }
DEBUG module name template0
DEBUG module contents: package template

get_message(__local3__, __local4__) = __local5__ { not __local3__.message; __local5__ = __local4__ }
get_message(__local6__, __local7__) = __local8__ { __local8__ = __local6__.message }
violation[{"details": {"missing_labels": __local12__}, "msg": __local14__}] { __local9__ = {label | input.review.object.metadata.labels[label]}; __local11__ = {__local10__ | __local10__ = input.parameters.labels[_].key}; minus(__local11__, __local9__, __local20__); __local12__ = __local20__; count(__local12__, __local21__); gt(__local21__, 0); sprintf("you must provide labels: %v", [__local12__], __local22__); __local13__ = __local22__; __local32__ = input.parameters; data.template.get_message(__local32__, __local13__, __local23__); __local14__ = __local23__ }
violation[{"msg": __local18__}] { __local15__ = input.review.object.metadata.labels[key]; __local16__ = input.parameters.labels[_]; __local16__.key = key; __local33__ = __local16__.allowedRegex; neq(__local33__, ""); __local34__ = __local16__.allowedRegex; not re_match(__local34__, __local15__); __local35__ = __local16__.allowedRegex; sprintf("Label <%v: %v> does not satisfy allowed regex: %v", [key, __local15__, __local35__], __local24__); __local17__ = __local24__; __local36__ = input.parameters; data.template.get_message(__local36__, __local17__, __local25__); __local18__ = __local25__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa results []
DEBUG trace Target: admission.k8s.gatekeeper.sh
Trace:
Enter data.hooks.violation[result] = _; _
| Eval data.hooks.violation[result] = _
| Index data.hooks.violation (matched 0 rules)
| Fail data.hooks.violation[result] = _
Enter data.hooks.violation[result] = _; _
| Eval data.hooks.violation[result] = _
| Index data.hooks.violation (matched 0 rules)
| Fail data.hooks.violation[result] = _

It looks like somehow calling r.opa.RemoveData(ctx, target.WipeData()) is causing the templates to also no longer exist, even though they live at different roots:

constraint storage code:

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/local/driver.go#L103-L132

code defining constraint storage root:

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/interface.go#L64-L69

data removal code:

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/local/driver.go#L152-L156

data removal root:

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/local/storages.go#L105-L107

So I'm not sure how one is clobbering the other.

That's the current state of what I've found out.

maxsmythe commented 2 years ago

Actually, here is the debug run where the "remove data" code is disabled:

DEBUG Reviewing sfg 
DEBUG in review
DEBUG attempting to match against container-must-have-limits
DEBUG matching sfg 
DEBUG attempting to match against all-must-have-owner
DEBUG matching sfg 
DEBUG match successful sfg 
DEBUG match result {0xc00011a698  <nil>}
DEBUG attempting to match against unique-service-selector
DEBUG match result {0xc000a66848  <nil>}
DEBUG attempting to match against must-have-probes
DEBUG matching sfg 
DEBUG attempting to match against prod-repo-is-openpolicyagent
DEBUG matching sfg 
DEBUG calling opa
DEBUG opa constraints K8sRequiredLabels
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local27__ = __local0__.kind; __local28__ = __local0__.name; __local29__ = data.constraints[__local27__][__local28__]; __local30__ = input.review; __local1__ = {"parameters": __local29__, "review": __local30__}; __local26__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local26__; object.get(r, "details", {}, __local19__); __local31__ = r.msg; __local2__ = {"details": __local19__, "key": __local0__, "msg": __local31__} }
DEBUG module name template0
DEBUG module contents: package template

get_message(__local3__, __local4__) = __local5__ { not __local3__.message; __local5__ = __local4__ }
get_message(__local6__, __local7__) = __local8__ { __local8__ = __local6__.message }
violation[{"details": {"missing_labels": __local12__}, "msg": __local14__}] { __local9__ = {label | input.review.object.metadata.labels[label]}; __local11__ = {__local10__ | __local10__ = input.parameters.labels[_].key}; minus(__local11__, __local9__, __local20__); __local12__ = __local20__; count(__local12__, __local21__); gt(__local21__, 0); sprintf("you must provide labels: %v", [__local12__], __local22__); __local13__ = __local22__; __local32__ = input.parameters; data.template.get_message(__local32__, __local13__, __local23__); __local14__ = __local23__ }
violation[{"msg": __local18__}] { __local15__ = input.review.object.metadata.labels[key]; __local16__ = input.parameters.labels[_]; __local16__.key = key; __local33__ = __local16__.allowedRegex; neq(__local33__, ""); __local34__ = __local16__.allowedRegex; not re_match(__local34__, __local15__); __local35__ = __local16__.allowedRegex; sprintf("Label <%v: %v> does not satisfy allowed regex: %v", [key, __local15__, __local35__], __local24__); __local17__ = __local24__; __local36__ = input.parameters; data.template.get_message(__local36__, __local17__, __local25__); __local18__ = __local25__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa constraints K8sUniqueServiceSelector
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local26__ = __local0__.kind; __local27__ = __local0__.name; __local28__ = data.constraints[__local26__][__local27__]; __local29__ = input.review; __local1__ = {"parameters": __local28__, "review": __local29__}; __local25__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local25__; object.get(r, "details", {}, __local16__); __local30__ = r.msg; __local2__ = {"details": __local16__, "key": __local0__, "msg": __local30__} }
DEBUG module name template0
DEBUG module contents: package template

make_apiversion(__local3__) = apiVersion { __local4__ = __local3__.group; __local5__ = __local3__.version; neq(__local4__, ""); sprintf("%v/%v", [__local4__, __local5__], __local17__); apiVersion = __local17__ }
make_apiversion(__local6__) = apiVersion { __local6__.group = ""; apiVersion = __local6__.version }
identical(__local7__, __local8__) = true { __local7__.metadata.namespace = __local8__.namespace; __local7__.metadata.name = __local8__.name; __local7__.kind = __local8__.kind.kind; __local31__ = __local8__.kind; data.template.make_apiversion(__local31__, __local18__); __local7__.apiVersion = __local18__ }
flatten_selector(__local9__) = __local11__ { __local10__ = [s | val = __local9__.spec.selector[key]; concat(":", [key, val], __local19__); s = __local19__]; sort(__local10__, __local20__); concat(",", __local20__, __local21__); __local11__ = __local21__ }
violation[{"msg": __local15__}] { input.review.kind.kind = "Service"; input.review.kind.version = "v1"; input.review.kind.group = ""; __local32__ = input.review.object; data.template.flatten_selector(__local32__, __local22__); __local12__ = __local22__; __local13__ = data.inventory.namespace[namespace][_][_][name]; __local33__ = input.review; not data.template.identical(__local13__, __local33__); data.template.flatten_selector(__local13__, __local23__); __local14__ = __local23__; __local12__ = __local14__; sprintf("same selector as service <%v> in namespace <%v>", [name, namespace], __local24__); __local15__ = __local24__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": null,
         "K8sAllowedRepos": null,
         "K8sContainerLimits": null,
         "K8sRequiredLabels": null,
         "K8sRequiredProbes": null,
         "K8sUniqueServiceSelector": null,
         "TestImport": null
      }
   }
}
DEBUG opa results [0xc001a9adc0]
DEBUG trace Target: admission.k8s.gatekeeper.sh
Trace:
Enter data.hooks.violation[result] = _; _
| Eval data.hooks.violation[result] = _
| Index data.hooks.violation (matched 1 rule)
{"level":"info","ts":1651682533.832534,"logger":"controller","msg":"All namespaces must have an `owner` label that points to your company username","process":"audit","audit_id":"2022-05-04T16:42:10Z","details":{"missing_labels":["owner"]},"event_type":"violation_audited","constraint_group":"constraints.gatekeeper.sh","constraint_api_version":"v1beta1","constraint_kind":"K8sRequiredLabels","constraint_name":"all-must-have-owner","constraint_namespace":"","constraint_action":"deny","resource_group":"","resource_api_version":"v1","resource_kind":"Namespace","resource_namespace":"","resource_name":"sfg"}
| Enter data.hooks.violation
| | Eval key = input.constraints[_]
| | Eval __local27__ = key.kind
| | Eval __local28__ = key.name
| | Eval __local29__ = data.constraints[__local27__][__local28__]
| | Eval __local30__ = input.review
| | Eval inp = {"parameters": __local29__, "review": __local30__}
| | Eval __local26__ = data.external
| | Eval data.template.violation[r] with input as inp with data.inventory as __local26__
| | Index data.template.violation (matched 2 rules)
| | Enter data.template.violation
| | | Eval provided = {label | input.review.object.metadata.labels[label]}
| | | Enter input.review.object.metadata.labels[label]
| | | | Eval input.review.object.metadata.labels[label]
| | | | Exit input.review.object.metadata.labels[label]
| | | Redo input.review.object.metadata.labels[label]
| | | | Redo input.review.object.metadata.labels[label]
| | | Eval required = {label | label = input.parameters.labels[_].key}
| | | Enter label = input.parameters.labels[_].key
| | | | Eval label = input.parameters.labels[_].key
| | | | Exit label = input.parameters.labels[_].key
| | | Redo label = input.parameters.labels[_].key
| | | | Redo label = input.parameters.labels[_].key
| | | Eval minus(required, provided, __local20__)
| | | Eval missing = __local20__
| | | Eval count(missing, __local21__)
| | | Eval gt(__local21__, 0)
| | | Eval sprintf("you must provide labels: %v", [missing], __local22__)
| | | Eval def_msg = __local22__
| | | Eval __local32__ = input.parameters
| | | Eval data.template.get_message(__local32__, def_msg, __local23__)
| | | Index data.template.get_message (matched 2 rules)
| | | Enter data.template.get_message
| | | | Eval msg = parameters.message
| | | | Exit data.template.get_message
| | | Eval msg = __local23__
| | | Exit data.template.violation
| | Redo data.template.violation
| | | Redo msg = __local23__
| | | Redo data.template.get_message(__local32__, def_msg, __local23__)
| | | Redo data.template.get_message
| | | | Redo msg = parameters.message
| | | Enter data.template.get_message
| | | | Eval not parameters.message
| | | | Enter parameters.message
| | | | | Eval parameters.message
| | | | | Exit parameters.message
| | | | Redo parameters.message
| | | | | Redo parameters.message
| | | | Fail not parameters.message
| | | Redo __local32__ = input.parameters
| | | Redo def_msg = __local22__
| | | Redo sprintf("you must provide labels: %v", [missing], __local22__)
| | | Redo gt(__local21__, 0)
| | | Redo count(missing, __local21__)
| | | Redo missing = __local20__
| | | Redo minus(required, provided, __local20__)
| | | Redo required = {label | label = input.parameters.labels[_].key}
| | | Redo provided = {label | input.review.object.metadata.labels[label]}
| | Enter data.template.violation
| | | Eval value = input.review.object.metadata.labels[key]
| | | Eval expected = input.parameters.labels[_]
| | | Eval expected.key = key
| | | Fail expected.key = key
| | | Redo expected = input.parameters.labels[_]
| | | Redo value = input.review.object.metadata.labels[key]
| | Eval object.get(r, "details", {}, __local19__)
| | Eval __local31__ = r.msg
| | Eval response = {"details": __local19__, "key": key, "msg": __local31__}
| | Exit data.hooks.violation
| Redo data.hooks.violation
| | Redo response = {"details": __local19__, "key": key, "msg": __local31__}
| | Redo __local31__ = r.msg
| | Redo object.get(r, "details", {}, __local19__)
| | Redo data.template.violation[r] with input as inp with data.inventory as __local26__
| | Redo __local26__ = data.external
| | Redo inp = {"parameters": __local29__, "review": __local30__}
| | Redo __local30__ = input.review

still no dumping of data, so the driver dump command may not be dumping the constraints, though note that the tracing is much more active.

maxsmythe commented 2 years ago

Ah, there is a bug in Dump(), it's querying data.data instead of data because data is automatically prefixed as part of the query:

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/local/driver.go#L292

https://github.com/open-policy-agent/frameworks/blob/496a6ae48e9a00c6651c2b1a1ce1e4ef166a3fcd/constraint/pkg/client/drivers/local/driver.go#L169-L174

Fixing that to see what actually lives in data.

maxsmythe commented 2 years ago

Now Dump() is behaving as expected. It looks like cache wiping is not the issue.

Here is the output for the "bad" build, (without WipeData{} removed):

DEBUG Reviewing sfg.16ebdfaaeb50c0b3 gatekeeper-system
DEBUG in review
DEBUG attempting to match against container-must-have-limits
DEBUG matching sfg.16ebdfaaeb50c0b3 gatekeeper-system
DEBUG attempting to match against all-must-have-owner
DEBUG matching sfg.16ebdfaaeb50c0b3 gatekeeper-system
DEBUG attempting to match against unique-service-selector
DEBUG match result {0xc0004d8f40  <nil>}
DEBUG attempting to match against must-have-probes
DEBUG matching sfg.16ebdfaaeb50c0b3 gatekeeper-system
DEBUG attempting to match against prod-repo-is-openpolicyagent
DEBUG matching sfg.16ebdfaaeb50c0b3 gatekeeper-system
DEBUG calling opa
DEBUG opa constraints K8sUniqueServiceSelector
DEBUG module name hooks.hooks_builtin
DEBUG module contents: package hooks

violation[__local2__] { __local0__ = input.constraints[_]; __local26__ = __local0__.kind; __local27__ = __local0__.name; __local28__ = data.constraints[__local26__][__local27__]; __local29__ = input.review; __local1__ = {"parameters": __local28__, "review": __local29__}; __local25__ = data.external; data.template.violation[r] with input as __local1__ with data.inventory as __local25__; object.get(r, "details", {}, __local16__); __local30__ = r.msg; __local2__ = {"details": __local16__, "key": __local0__, "msg": __local30__} }
DEBUG module name template0
DEBUG module contents: package template

make_apiversion(__local3__) = apiVersion { __local4__ = __local3__.group; __local5__ = __local3__.version; neq(__local4__, ""); sprintf("%v/%v", [__local4__, __local5__], __local17__); apiVersion = __local17__ }
make_apiversion(__local6__) = apiVersion { __local6__.group = ""; apiVersion = __local6__.version }
identical(__local7__, __local8__) = true { __local7__.metadata.namespace = __local8__.namespace; __local7__.metadata.name = __local8__.name; __local7__.kind = __local8__.kind.kind; __local31__ = __local8__.kind; data.template.make_apiversion(__local31__, __local18__); __local7__.apiVersion = __local18__ }
flatten_selector(__local9__) = __local11__ { __local10__ = [s | val = __local9__.spec.selector[key]; concat(":", [key, val], __local19__); s = __local19__]; sort(__local10__, __local20__); concat(",", __local20__, __local21__); __local11__ = __local21__ }
violation[{"msg": __local15__}] { input.review.kind.kind = "Service"; input.review.kind.version = "v1"; input.review.kind.group = ""; __local32__ = input.review.object; data.template.flatten_selector(__local32__, __local22__); __local12__ = __local22__; __local13__ = data.inventory.namespace[namespace][_][_][name]; __local33__ = input.review; not data.template.identical(__local13__, __local33__); data.template.flatten_selector(__local13__, __local23__); __local14__ = __local23__; __local12__ = __local14__; sprintf("same selector as service <%v> in namespace <%v>", [name, namespace], __local24__); __local15__ = __local24__ }
DEBUG dump {
   "data": {
      "admission.k8s.gatekeeper.sh": {
         "Bell": [
            {
               "expressions": [
                  {
                     "value": {
                        "constraints": {
                           "K8sAllowedRepos": {
                              "prod-repo-is-openpolicyagent": {
                                 "repos": [
                                    "openpolicyagent"
                                 ]
                              }
                           },
                           "K8sContainerLimits": {
                              "container-must-have-limits": {
                                 "cpu": "200m",
                                 "memory": "1Gi"
                              }
                           },
                           "K8sRequiredLabels": {
                              "all-must-have-owner": {
                                 "labels": [
                                    {
                                       "allowedRegex": "^[a-zA-Z]+.agilebank.demo$",
                                       "key": "owner"
                                    }
                                 ],
                                 "message": "All namespaces must have an `owner` label that points to your company username"
                              }
                           },
                           "K8sRequiredProbes": {
                              "must-have-probes": {
                                 "probeTypes": [
                                    "tcpSocket",
                                    "httpGet",
                                    "exec"
                                 ],
                                 "probes": [
                                    "readinessProbe",
                                    "livenessProbe"
                                 ]
                              }
                           },
                           "K8sUniqueServiceSelector": {
                              "unique-service-selector": {}
                           }
                        },
                        "hooks": {
                           "violation": []
                        },
                        "template": {
                           "violation": [
                              {
                                 "msg": "msg"
                              }
                           ]
                        }
                     },
                     "text": "data",
                     "location": {
                        "row": 1,
                        "col": 1
                     }
                  }
               ]
            }
         ],
         "K8sAllowedRepos": [
            {
               "expressions": [
                  {
                     "value": {
                        "constraints": {
                           "K8sAllowedRepos": {
                              "prod-repo-is-openpolicyagent": {
                                 "repos": [
                                    "openpolicyagent"
                                 ]
                              }
                           },
                           "K8sContainerLimits": {
                              "container-must-have-limits": {
                                 "cpu": "200m",
                                 "memory": "1Gi"
                              }
                           },
                           "K8sRequiredLabels": {
                              "all-must-have-owner": {
                                 "labels": [
                                    {
                                       "allowedRegex": "^[a-zA-Z]+.agilebank.demo$",
                                       "key": "owner"
                                    }
                                 ],
                                 "message": "All namespaces must have an `owner` label that points to your company username"
                              }
                           },
                           "K8sRequiredProbes": {
                              "must-have-probes": {
                                 "probeTypes": [
                                    "tcpSocket",
                                    "httpGet",
                                    "exec"
                                 ],
                                 "probes": [
                                    "readinessProbe",
                                    "livenessProbe"
                                 ]
                              }
                           },
                           "K8sUniqueServiceSelector": {
                              "unique-service-selector": {}
                           }
                        },
                        "hooks": {
                           "violation": []
                        },
                        "template": {
                           "violation": []
                        }
                     },
                     "text": "data",
sozercan commented 2 years ago

@mbrowatzki @mrjoelkamp thanks for reporting this! We fixed this in v3.8.1, let us know if this fixes your issue

mbrowatzki commented 2 years ago

Hello, v.3.8.1 works for me :). Thanks a lot for the fast fix.

mrjoelkamp commented 2 years ago

@sozercan @maxsmythe Thanks for the update! It is working as expected again. I appreciate the help on this!

maxsmythe commented 2 years ago

Thank you for reporting the issue!