[RFE] Cluster maximums OCP workload

rsevilla87 commented 1 year ago

Is your feature request related to a problem? Please describe.

Having a workload able to reproduce the documented cluster-maximums can be very useful to detect regressions of some components that are not that intensively used by the current workloads.

i.e.:

Benchmarking max number of CRDs: It has been proven that a high number of CRDs had a negative impact in the API performance. Both in API responsiveness and resource usage. We're not tracking this scenario at the moment
Max number of endpoints per service. In our current workloads, we're testing a high number of services, however we're not adding a high number of endpoints to them. This scenario is being currently tracked in upstream with kube-proxy implemented services, but we're not actually tracking it with OVNKubernetes

There more examples like the above. This new workload shouldn't be used as a rule of thumb to demonstrate the limits of a cluster, but as a new helper to detect and verify scenarios we're not currently tracking.

Describe the solution you'd like

The cluster-maximums workload should be self-contained, based on a multi-job benchmark. With this approach maintaining and updating will be easier.

I started coding this workload, a initial approach about how it would look like is in the following snippet:

# Would test 10k namespaces, 10k routes, 10k service, 20k pods and 30k network policies                                
  - name: max-namespaces                                                                                                                                      
    namespace: max-namespaces
    jobIterations: {{.NAMESPACES}}                                                                                     
    qps: {{.QPS}}                                     
    burst: {{.BURST}}                                                                                                  
    namespacedIterations: true                   
    waitWhenFinished: true                       
    preLoadImages: false                   # We don't need to preload since this job is reusing images previously used                                        
    jobPause: 2m                                                                                                                                              
    namespaceLabels:                   
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                                                                                                          
      pod-security.kubernetes.io/audit: privileged  
      pod-security.kubernetes.io/warn: privileged 
    objects:                                     
      - objectTemplate: deployment-server.yml              
        replicas: 1                                     
        inputVars: 
          podReplicas: 1                                                                                               
      - objectTemplate: deployment-client.yml
        replicas: 1                          
        inputVars:  
          podReplicas: 1                         
          ingressDomain: {{.INGRESS_DOMAIN}}            
      - objectTemplate: service.yml                        
        replicas: 1                                                                                                                     
      - objectTemplate: route.yml                                   
        replicas: 1                                                 
      - objectTemplate: np-deny-all.yml                             
        replicas: 1                                                 
      - objectTemplate: np-allow-from-clients.yml                   
        replicas: 1                                                 
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            

  - name: remove-max-namespaces                                                
    qps: 5                                                                     
    burst: 5                                                                   
    jobType: delete                                                            
    jobPause: 2m                                                               
    objects:                                                                   
      - kind: Namespace                                                                                                                                       
        labelSelector: {kube-burner-job: max-namespaces}                       

# 5k backends per service: Five times -> 5k server pods + 1 client pods + 1 route + 3 network policies                                                        
  - name: max-backends                                                         
    namespace: max-backends                                                    
    jobIterations: 5                                                           
    qps: {{.QPS}}                                                              
    burst: {{.BURST}}                                                          
    namespacedIterations: true                                                 
    waitWhenFinished: true                                                     
    preLoadImages: false             # We don't need to preload since this job is reusing images previously used                                              
    jobPause: 2m                                                               
    namespaceLabels:                                                           
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                           
      pod-security.kubernetes.io/audit: privileged                             
      pod-security.kubernetes.io/warn: privileged                              
    objects:                                                                   
      - objectTemplate: deployment-server.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: {{.BACKENDS}}                                           
      - objectTemplate: deployment-client.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: 1                                                       
          ingressDomain: {{.INGRESS_DOMAIN}}                                   
      - objectTemplate: service.yml                                            
        replicas: 1                                                            
      - objectTemplate: route.yml                                              
        replicas: 1                                                            
      - objectTemplate: np-deny-all.yml                                        
        replicas: 1                                                            
      - objectTemplate: np-allow-from-clients.yml                              
        replicas: 1                                                            
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            

  - name: remove-max-backends                                                  
    jobType: delete                                                            
    objects:                                                                   
      - kind: Namespace                                                        
        labelSelector: {kube-burner-job: max-backends}

github-actions[bot] commented 11 months ago

This issue has become stale and will be closed automatically within 7 days.

qiliRedHat commented 10 months ago

Bug https://issues.redhat.com/browse/MON-3394 is discovered in ROSA with large number of namespaces with big number secrets "there are a lot of secrets on the cluster: 24464". So I suggest we added at least 3 secrets per namespace to cover this. (3 secrets x 10k namespaces=30k secrets > 24464) In the old max-namespaces workload, there are 10 secrets in each namespace: https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/workloads/max-namespaces/max-namespaces.yml#L95C12-L95C12

github-actions[bot] commented 5 months ago

This issue has become stale and will be closed automatically within 7 days.

kube-burner / kube-burner-ocp

[RFE] Cluster maximums OCP workload #4