splunk / splunk-operator

Splunk Operator for Kubernetes
Other
210 stars 115 forks source link

Splunk Operator: add scale subresource #1272

Closed yaroslav-nakonechnikov closed 8 months ago

yaroslav-nakonechnikov commented 10 months ago

Please select the type of request

Feature Request

Tell us more

Describe the request In order to start using keda (https://keda.sh/docs/2.11/concepts/scaling-deployments/#scaling-of-custom-resources), which will help a lot for testing/development stack, need to add support for https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource

Expected behavior scale subresource added and it is possible to use keda natively

akondur commented 9 months ago

Hey @yaroslav-nakonechnikov , we are able to scale custom resources using replicas as mentioned here. Can you try using the same?

yaroslav-nakonechnikov commented 9 months ago

@akondur it is different thing. if you open https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource, you will see explanation about advanced usage of replicas.

simple replicas can be used, but with workarounds and writing custom scripts.

akondur commented 8 months ago

Hey @yaroslav-nakonechnikov , the operator CRDs have the scale subresource already embedded in. Couple of code references:

Standalone SHC

With operator version 2.5.1 and v4 CRDs deployed on an EKS cluster:

bash% k get crds/standalones.enterprise.splunk.com -o yaml | grep -i scale: -A 3
      scale:
        labelSelectorPath: .status.selector
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas
--
      scale:
        labelSelectorPath: .status.selector
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas

I tried autoscaling using Keda using the following steps:

  1. Install keda using instructions from here.
  2. Use the scaledobject spec below to target a standalone resource
bash% cat ~/keda_scaler.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: keda-sa
  namespace: splunk-operator
spec:
  scaleTargetRef:
    apiVersion: enterprise.splunk.com/v4
    kind: Standalone
    name: example
  pollingInterval:  5                                      # Optional. Default: 30 seconds
  cooldownPeriod:   10                                     # Optional. Default: 300 seconds
  idleReplicaCount: 0                                       # Optional. Default: ignored, must be less than minReplicaCount
  minReplicaCount:  1                                       # Optional. Default: 0
  maxReplicaCount:  100                                     # Optional. Default: 100
  advanced:                                                 # Optional. Section to specify advanced options
  triggers:
  - type: cpu
    #metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue'
    metadata:
      type: Utilization # Deprecated in favor of trigger.metricType; allowed types are 'Utilization' or 'AverageValue'
      value: "5"

Once deployed:

bash% kubectl describe scaledobject  Creation Timestamp:  2024-02-27T01:17:32Z
  Finalizers:
    finalizer.keda.sh
  Generation:        1
  Resource Version:  13949
  UID:               63fae67f-e2df-4379-8e1a-7661ec5b0179
Spec:
  Cooldown Period:     10
  Idle Replica Count:  0
  Max Replica Count:   100
  Min Replica Count:   1
  Polling Interval:    5
  Scale Target Ref:
    API Version:  enterprise.splunk.com/v4
    Kind:         Standalone
    Name:         example
  Triggers:
    Metadata:
      Type:   Utilization
      Value:  5
    Type:     cpu
Status:
  Conditions:
    Message:               ScaledObject is defined correctly and is ready for scaling
    Reason:                ScaledObjectReady
    Status:                True
    Type:                  Ready
    Message:               Scaling is performed because triggers are active
    Reason:                ScalerActive
    Status:                True
    Type:                  Active
    Status:                Unknown
    Type:                  Fallback
    Status:                Unknown
    Type:                  Paused
  Hpa Name:                keda-hpa-keda-sa
  Last Active Time:        2024-02-27T01:43:57Z
  Original Replica Count:  1
  Resource Metric Names:
    cpu
  Scale Target GVKR:
    Group:            enterprise.splunk.com
    Kind:             Standalone
    Resource:         standalones
    Version:          v4
  Scale Target Kind:  enterprise.splunk.com/v4.Standalone
Events:
  Type    Reason              Age   From           Message
  ----    ------              ----  ----           -------
  Normal  KEDAScalersStarted  26m   keda-operator  Scaler cpu is built.
  Normal  KEDAScalersStarted  26m   keda-operator  Started scalers watch
  Normal  ScaledObjectReady   26m   keda-operator  ScaledObject is ready for scaling

Corresponding HPA where its able to find the scale subresource and hook onto it:

bash % k describe hpa
Name:                                                  keda-hpa-keda-sa
Namespace:                                             splunk-operator
Labels:                                                app.kubernetes.io/managed-by=keda-operator
                                                       app.kubernetes.io/name=keda-hpa-keda-sa
                                                       app.kubernetes.io/part-of=keda-sa
                                                       app.kubernetes.io/version=2.13.0
                                                       scaledobject.keda.sh/name=keda-sa
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 26 Feb 2024 19:17:32 -0600
Reference:                                             Standalone/example
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 5%
Min replicas:                                          1
Max replicas:                                          100
Standalone pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale

From the results above looks like the scale subresource is working. Is there an error you are seeing when deploying with keda? Does deploying a hpa like below work for you?

kind: HorizontalPodAutoscaler
metadata:
  name: sa-hpa
  namespace: splunk-operator
spec:
  scaleTargetRef:
    apiVersion: enterprise.splunk.com/v4
    kind: Standalone
    name: example
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 1
yaroslav-nakonechnikov commented 8 months ago

super, thanks. this is very helpful!

yaroslav-nakonechnikov commented 8 months ago

@akondur i finally rechecked this keda and subresources, but it don't work as expected... for example:

[yn@ip-10-216-35-48 ~]$ k describe hpa -n splunk-operator
Name:                     keda-hpa-keda-sa
Namespace:                splunk-operator
Labels:                   app.kubernetes.io/managed-by=keda-operator
                          app.kubernetes.io/name=keda-hpa-keda-sa
                          app.kubernetes.io/part-of=keda-sa
                          app.kubernetes.io/version=2.13.1
                          scaledobject.keda.sh/name=keda-sa
Annotations:              autoscaling.alpha.kubernetes.io/conditions:
                            [{"type":"AbleToScale","status":"True","lastTransitionTime":"2024-03-15T16:33:41Z","reason":"ReadyForNewScale","message":"recommended size...
                          autoscaling.alpha.kubernetes.io/current-metrics:
                            [{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":100,"currentAverageValue":"2002m"}}]
CreationTimestamp:        Fri, 15 Mar 2024 16:33:26 +0000
Reference:                IndexerCluster/site6-32002
Target CPU utilization:   500%
Current CPU utilization:  100%
Min replicas:             1
Max replicas:             3
IndexerCluster pods:      2 current / 2 desired
Events:
  Type    Reason             Age                   From                       Message
  ----    ------             ----                  ----                       -------
  Normal  SuccessfulRescale  37m (x43 over 2d11h)  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
  Normal  SuccessfulRescale  37m (x44 over 2d11h)  horizontal-pod-autoscaler  New size: 2; reason: cpu resource utilization (percentage of request) above target
[yn@ip-10-216-35-48 ~]$ kubectl get pods -n splunk-operator
NAME                                                  READY   STATUS    RESTARTS         AGE
splunk-32002-cluster-manager-0                        1/1     Running   74 (3d22h ago)   4d4h
splunk-32002-license-manager-0                        1/1     Running   1 (2d5h ago)     2d5h
splunk-32002-monitoring-console-0                     1/1     Running   0                3d22h
splunk-c-32002-standalone-0                           1/1     Running   0                3d9h
splunk-e-32002-deployer-0                             1/1     Running   1 (3d4h ago)     3d5h
splunk-e-32002-search-head-0                          1/1     Running   0                2d4h
splunk-e-32002-search-head-1                          1/1     Running   0                3d5h
splunk-e-32002-search-head-2                          1/1     Running   0                3d5h
splunk-operator-controller-manager-667fff5754-vjxd5   2/2     Running   1 (3d22h ago)    4d5h
splunk-site6-32002-indexer-0                          1/1     Running   5 (3d3h ago)     3d4h
splunk-site6-32002-indexer-1                          1/1     Running   0                38h
splunk-site6-32002-indexer-2                          1/1     Running   0                2d21h
splunk-site6-32002-indexer-3                          1/1     Running   0                2d21h
splunk-site6-32002-indexer-4                          1/1     Running   0                41h
splunk-site6-32002-indexer-5                          0/1     Running   249 (44s ago)    40h

so i expect to see only 2 indexers.

i've deleted 4:

[yn@ip-10-216-35-48 ~]$ kubectl delete pods -n splunk-operator splunk-site6-32002-indexer-2 splunk-site6-32002-indexer-3 splunk-site6-32002-indexer-4 splunk-site6-32002-indexer-5
pod "splunk-site6-32002-indexer-2" deleted
pod "splunk-site6-32002-indexer-3" deleted
pod "splunk-site6-32002-indexer-4" deleted
pod "splunk-site6-32002-indexer-5" deleted

[yn@ip-10-216-35-48 ~]$ kubectl get pods -n splunk-operator
NAME                                                  READY   STATUS    RESTARTS         AGE
splunk-32002-cluster-manager-0                        1/1     Running   74 (3d22h ago)   4d4h
splunk-32002-license-manager-0                        1/1     Running   1 (2d5h ago)     2d5h
splunk-32002-monitoring-console-0                     1/1     Running   0                3d22h
splunk-c-32002-standalone-0                           1/1     Running   0                3d9h
splunk-e-32002-deployer-0                             1/1     Running   1 (3d4h ago)     3d5h
splunk-e-32002-search-head-0                          1/1     Running   0                2d4h
splunk-e-32002-search-head-1                          1/1     Running   0                3d5h
splunk-e-32002-search-head-2                          1/1     Running   0                3d5h
splunk-operator-controller-manager-667fff5754-vjxd5   2/2     Running   1 (3d22h ago)    4d5h
splunk-site6-32002-indexer-0                          1/1     Running   5 (3d3h ago)     3d4h
splunk-site6-32002-indexer-1                          1/1     Running   0                38h
splunk-site6-32002-indexer-2                          0/1     Running   0                3s
splunk-site6-32002-indexer-3                          0/1     Running   0                3s
splunk-site6-32002-indexer-4                          0/1     Running   0                3s
splunk-site6-32002-indexer-5                          0/1     Running   0                3s

and these 4 again were recreated.

why, if hpa says that only 2 is needed?

yaroslav-nakonechnikov commented 8 months ago

ok, looks like this is related to https://github.com/splunk/splunk-operator/issues/1293 as i see that splunk-operator can't create more (and less) replicas even manually editing crd.