[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler

grafana / helm-charts

Apache License 2.0

1.63k stars 2.26k forks source link

querier: autoscaling: scaler: native # native or keda behavior: {} # Configure KEDA Prometheus trigger. # See also: https://keda.sh/docs/latest/scalers/prometheus/ targetMetricsConfigure: query: sum(max_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[2m])) serverAddress: http://prometheus.default:9090/prometheus threshold: 4

# hpa.yaml {{- if .Values.querier.autoscaling.enabled }} {{- if eq .Values.querier.autoscaling.scaler "native" }} {{- $apiVersion := include "loki.hpa.apiVersion" . -}} apiVersion: {{ $apiVersion }} kind: HorizontalPodAutoscaler # ... spec: # ... {{- if (eq $apiVersion "autoscaling/v2") }} {{- with .Values.querier.autoscaling.behavior }} behavior: {{- toYaml . | nindent 4 }} {{- end }} {{- end }} {{- else if eq .Values.querier.autoscaling.scaler "keda" }} apiVersion: keda.sh/v1alpha1 kind: ScaledObject # ... spec: # ... {{- with .Values.querier.autoscaling.behavior }} advanced: horizontalPodAutoscalerConfig: behavior: {{- toYaml . | nindent 8 }} {{- end }} triggers: {{- with .Values.querier.autoscaling.targetCPUUtilizationPercentage }} - type: cpu metricType: Utilization metadata: value: "60" {{- end }} # ... {{- with .Values.querier.autoscaling.targetMetricsConfigure }} - metadata: metricName: querier_autoscaling_metric query: {{ .query }} serverAddress: {{ .serverAddress }} threshold: {{ .threshold }} type: prometheus {{- end }} {{- end }} {{- end }}

We had exactly the same issue.

We wanted Loki to scale/downscale more steadily by tuning both the behavior.scaleUp and behavior.scaleDown policies, but we couldn't using the provided HPA resources, so we rolled out our own manifests on top of the chart.

One of the problems we had is that unless we enable HPA with autoscaling.enabled: true, which we don't want to given that we use our own HPA manifests, we can't avoid setting the replicas of each component.

spec:
{{- if not .Values.distributor.autoscaling.enabled }}
  replicas: {{ .Values.distributor.replicas }}
{{- end }}

That's a problem when using a GitOps operator like Argo CD, because once the HPA tries to scale, Argo CD will reconcile the state setting whatever the value is in the replicas option, preventing any scale up.

We solved it by ignoring that field in Argo CD but it'll be nice to be able to use custom HPAs configurations or KEDA objects, and still be able to avoid defining the replica in the templates.

    ignoreDifferences:
      - group: apps
        kind: Deployment
        name: loki-distributor
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: StatefulSet
        name: loki-ingester
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-querier
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-query-frontend
        namespace: loki
        jsonPointers:
          - /spec/replicas
    syncPolicy:
      syncOptions:
        - RespectIgnoreDifferences=true

grafana / helm-charts

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126