[Bug] significant decrease in generate rule performance (+ too many UpdateRequests)

thesuperzapper commented 5 months ago

Kyverno Version

1.11.4

Description

Introduction

There has been a significant decrease in the performance of generate type policies since Kyverno 1.10.0.

For example, a ClusterPolicy that generates 18 ConfigMaps (across 2 namespaces) used to take ~1min and generate 60 UpdateRequests in 1.10.0, but takes ~3min and generates over 150 UpdateRequests in 1.11.4.

But even aside from this regression, I think there is a larger issue with how Kyverno is implementing generate rules, which is making it not scale properly (see conclusion for more info).

Testing

In an attempt to understand when this issue happened, I have tested many versions, but have found that all perform significantly worse than 1.10.0.

Here is some information about the testing:

Procedure:
1. Remove any existing target ConfigMaps
2. Restart Kyverno
3. Apply the ClusterPolicy (see appendix)
4. Monitor total UpdateRequests and time until they are cleared.
Cluster Attributes:
- Local K3D cluster
- Using the rancher/k3s:v1.26.12-k3s1 image
- 64Gb of RAM // Powerful CPU
About Environment:
- Only had Kyverno installed
- No other traffic/events
- Only had 9 total namespaces
Terms:
- "Reconcile Time":
  - the duration from when the policy was first applied to when all UpdateRequets were cleared
- "UpdateRequests Backlog":
  - is the peak backlog of UpdateRequets seen on the cluster

Test Results:

Kyverno main @ dd46f9eaf0342a0b617f1bf2b328b21ff14e14e9:
- Reconcile Time: 2 minutes 30 seconds
- UpdateRequests Backlog: 150
Kyverno 1.11.4:
- Reconcile Time: 2 minutes 50 seconds
- UpdateRequests Backlog: 200
Kyverno 1.10.6:
- Reconcile Time: 3 minutes 50 seconds
- UpdateRequests Backlog: 150
Kyverno 1.10.1:
- Reconcile Time: 3 minutes 50 seconds
- UpdateRequests Backlog: 150
Kyverno 1.10.0:
- Reconcile Time: 1 minutes 0 seconds
- UpdateRequests Backlog: 60

Test Notes:

Even in Kyverno 1.10.0, something seems wrong. How is it possible that so many UpdateRequests are required to create 18 ConfigMaps?
This behavior is also observed each on BACKGROUND_SCAN_INTERVAL sync, in addition to on first apply with generateExisting.
In a real cluster (with other stuff happening, and more target namespaces), it's easy to create a situation where UpdateRequests grow forever, as they are created faster than they can be cleared.

Conclusion

There must be something foundationally wrong with our current generate policy implementation (especially in newer versions, but also in 1.10.0). It doesn't make a lot of sense why hundreds of UpdateRequests and multiple minutes are required to create such a small number of resources. Other controllers regularly create/update hundreds of resources in less time and with minimal cluster load.

I have two main requests for the Kyvenro project:

In the short term, please review what is making generate rules create so many UpdateRequets, and reduce it to at most one per actual create/update (to allow users to update past 1.10.0).
In the longer term, please consider if UpdateRequests are even necessary for Kyverno to function. Even if implemented perfectly, where a single UpdateRequet was generated for each target resource. Introducing 3+ extra calls for every action the controller needs to take is wasteful when the controller can store its state internally, and may even DDOS larger clusters.

Personal Notes

I think there is a lot of demand for a "generic controller" that can create arbitrary resources based on user-defined conditions. Kyverno is well-positioned to become the standard here IF we resolve these performance issues.

As a possible solution, I suggest we look at how other controllers generate resources. In my experience, most use "WATCH" requests to efficiently detect changes, so they only have to do a "full reconcile" each time they boot up, and none use anything like our UpdateRequest (except for slow external dependencies, like cert-manager's CertificateRequest).

Appendix

Target Namespaces

apiVersion: v1
kind: Namespace
metadata:
  name: target-namespace-1

apiVersion: v1
kind: Namespace
metadata:
  name: target-namespace-2

Generate Policy 1

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-configmaps-1
spec:
  generateExisting: true
  rules:
    ########################################
    ## ConfigMap 1
    ########################################
    - name: configmap-1
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-1
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 2
    ########################################
    - name: configmap-2
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-2
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 3
    ########################################
    - name: configmap-3
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-3
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 4
    ########################################
    - name: configmap-4
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-4
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 5
    ########################################
    - name: configmap-5
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-5
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 6
    ########################################
    - name: configmap-6
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-6
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 7
    ########################################
    - name: configmap-7
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-7
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 8
    ########################################
    - name: configmap-8
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-8
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

    ########################################
    ## ConfigMap 9
    ########################################
    - name: configmap-9
      match:
        any:
          - resources:
              kinds:
                - Namespace
              names:
                - target-namespace-1
                - target-namespace-2
      generate:
        apiVersion: v1
        kind: ConfigMap
        name: configmap-9
        namespace: "{{ request.object.metadata.name }}"
        synchronize: true
        data:
          data:
            SOME_KEY_1: "SOME_VALUE_1"
            SOME_KEY_2: "SOME_VALUE_2"

Slack discussion

No response

Troubleshooting

[X] I have read and followed the documentation AND the troubleshooting guide.
[X] I have searched other issues in this repository and mine is not recorded.

thesuperzapper commented 5 months ago

@realshuting @aslafy-z @eddycharly @chipzoller @JimBugwadia sorry to tag you all again, but I think this is a serious issue for Kyvenro that needs your input.

I think it will need a short-term fix for 1.12 (to ensure that the generate rules remain usable), and a longer-term fix (possibly removing UpdateRequets) to make Kyverno more scalable and establish it as a "generic controller"

chipzoller commented 5 months ago

After looking into this, the issue seems to be UpdateRequest magnification. Why it's happening isn't clear. If you isolate the policy to include just a single rule, only a single UR is created per destination Namespace (so two total, one for target-namespace-1 one for target-namespace-2). When you add the second rule, this multiplies to 4 per ConfigMap per Namespace and so on. It clearly appears to be a bug. Creating multiple generate rules in the same policy with effectively the exact same scope is fairly uncommon. I was able to confirm that isolating one rule per policy solves the issue and only one UR per ConfigMap per Namespace is created as expected.

This is clearly something that needs to be investigated, it just may have to be a patch release.

luisdavim commented 5 months ago

We ran into this issue when using a policy to sync the image pull secret to all new NSs, in one of our clusters it crated thousands of updaterequests till it started crashing the API server, had to stop all kyverno pods and then it took me several hours running:

kg updaterequests -A | awk '{print $2 " -n " $1}' | xargs -L 1 -r -P25 kubectl delete updaterequests

To recover the cluster.

The policy looks something like this:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: sync-pull-secret
  annotations:
    policies.kyverno.io/title: Sync Pull Secret
    policies.kyverno.io/category: Generate
    policies.kyverno.io/subject: Secret
    policies.kyverno.io/minversion: 1.6.0
    policies.kyverno.io/description: >-
      Syncs pull-secret from platform namespace to all labeled namespaces and istio namespaces
spec:
  background: true
  generateExistingOnPolicyUpdate: true
  rules:
  - name: sync-pull-secret
    match:
      any:
      - resources:
          kinds:
          - Namespace
          selector:
            matchLabels:
              my-label: my-value
      - resources:
          kinds:
          - Namespace
          names:
          - istio-system
          - istio-ingress
    generate:
      apiVersion: v1
      kind: Secret
      name: pull-secret
      namespace: "{{request.object.metadata.name}}"
      synchronize: true
      clone:
        namespace: platform
        name: pull-secret
    exclude:
      resources:
        kinds:
        - Namespace
        names:
        - platform

thesuperzapper commented 4 months ago

@chipzoller @realshuting do we know if this will be fixed in Kyverno 1.12.0? And if we know the root cause?

Because it's a pretty serious issue that will keep me on 1.10.0 (which still has a less intense version of the problem).

realshuting commented 2 months ago

After looking into this, the issue seems to be UpdateRequest magnification. Why it's happening isn't clear. If you isolate the policy to include just a single rule, only a single UR is created per destination Namespace (so two total, one for target-namespace-1 one for target-namespace-2). When you add the second rule, this multiplies to 4 per ConfigMap per Namespace and so on. It clearly appears to be a bug. Creating multiple generate rules in the same policy with effectively the exact same scope is fairly uncommon.

With 1.11+, the UR is created per rule per matching resource. If your policy has a single rule and two matching resources (target-namespace-1 and target-namespace-2), 2 URs are created. When you add the second rule, another two URs are created, one for each namespace.

I was able to confirm that isolating one rule per policy solves the issue and only one UR per ConfigMap per Namespace is created as expected.

The total number of URs is fixed and determined by the number of rules and matching resources, no matter how you build the policy - one policy with two rules or two policies and each has a single rule yields the same number of URs. Why it solves the issue?

realshuting commented 1 month ago

Hi @thesuperzapper - we have released v1.12.4-rc.2 to reduce the UR creations specifically when generateExisting=true (also applies to BACKGROUND_SCAN_INTERVAL).

Could you help us verify if this reduces your overall updaterequests count?

kyverno / kyverno