kyverno / kyverno

Cloud Native Policy Management
https://kyverno.io
Apache License 2.0
5.67k stars 862 forks source link

[Bug] events generation won't stop on failures #10020

Closed realshuting closed 6 months ago

realshuting commented 6 months ago

Kyverno Version

1.12.0

Kubernetes Version

1.26.x

Kubernetes Platform

KinD

Kyverno Rule Type

Validate

Description

When running this load test to create 1k iterations across 100 virtual users, Kyverno kept creating events even if the namespace was being terminated, the logs were flooded with the following message:

2024-04-09T08:02:05Z    ERROR   EventGenerator  event/controller.go:124 failed to create event  {"key": "&Event{ObjectMeta:{test.17c48dd596c2b2c6  load-tests    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},EventTime:2024-04-09 08:01:46.804523416 +0000 UTC m=+94.441305606,Series:nil,ReportingController:kyverno-admission,ReportingInstance:kyverno-admission-kyverno-admission-controller-d7f5d6677-4xntf,Action:Resource Passed,Reason:PolicyViolation,Regarding:{Pod load-tests test b906e192-984c-4c28-85e8-bbd6056cfe9f v1  },Related:nil,Note:policy disallow-host-path/host-path fail: validation error: HostPath volumes are forbidden. The field spec.volumes[*].hostPath must be unset. rule host-path failed at path /spec/,Type:Warning,DeprecatedSource:{ },DeprecatedFirstTimestamp:0001-01-01 00:00:00 +0000 UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 +0000 UTC,DeprecatedCount:0,}", "error": "events.events.k8s.io \"test.17c48dd596c2b2c6\" is forbidden: unable to create new content in namespace load-tests because it is being terminated"}

When I decreased the test to create 100 iterations across 10 virtual users, the events creation stopped at some point but still failed because the namespace was deleted.

2024-04-09T08:02:51Z    ERROR   EventGenerator  event/controller.go:129 dropping event  {"key": "&Event{ObjectMeta:{test.17c48dd59143f0bc  load-tests    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},EventTime:2024-04-09 08:01:46.712330253 +0000 UTC m=+94.349112434,Series:nil,ReportingController:kyverno-admission,ReportingInstance:kyverno-admission-kyverno-admission-controller-d7f5d6677-4xntf,Action:Resource Passed,Reason:PolicyViolation,Regarding:{Pod load-tests test 3f16e0e8-981e-4c0a-a59e-f98852f2b0b9 v1  },Related:nil,Note:policy require-run-as-nonroot/run-as-non-root fail: validation error: Running as root is not allowed. Either the field spec.securityContext.runAsNonRoot must be set to `true`, or the fields spec.containers[*].securityContext.runAsNonRoot, spec.initContainers[*].securityContext.runAsNonRoot, and spec.ephemeralContainers[*].securityContext.runAsNonRoot must be set to `true`. rule run-as-non-root[0] failed at path /spec/ rule run-as-non-root[1] failed at path /spec/,Type:Warning,DeprecatedSource:{ },DeprecatedFirstTimestamp:0001-01-01 00:00:00 +0000 UTC,DeprecatedLastTimestamp:0001-01-01 00:00:00 +0000 UTC,DeprecatedCount:0,}", "error": "namespaces \"load-tests\" not found"}

There are a few questions need to be answered:

  1. Is there any retry when creating events?
  2. Can we optimize event generation when the target (namespace in this case is being terminated/deleted)?
  3. Is there a max threshold to limit the total event size?

Steps to reproduce

  1. Deploy Kyverno PSS policies
    kustomize build https://github.com/kyverno/policies/pod-security | kubectl apply -f -
  2. run the load test (under kyverno/load-testing repo)
    ./start.sh tests/kyverno-pods-dry-run.js 100 1000
  3. check the log for events generation

Expected behavior

The event generation should not occupy the main process especially when there's a flood of admission requests.

Screenshots

No response

Kyverno logs

No response

Slack discussion

No response

Troubleshooting

MariamFahmy98 commented 6 months ago
  1. When the creation of the event fails, it is re-queued again up to 3 times. In case, it exceeds the limit, then the event is dropped. https://github.com/kyverno/kyverno/blob/2503e000f360013ac17fc7d79fae3dd01df19648/pkg/event/controller.go#L122-L130

  2. The issue is that we are trying to create an event in a deleted namespace. First, we should check if the namespace exists before proceeding to create the events.

  3. AFAIK, there is no such limit.