Closed shine17 closed 1 month ago
can you share the generated HPA resource?
can you share the generated HPA resource?
@jaronoff97 this is the hpa configuration in my deployment yaml
replicas : {{ .Values.minReplicaCount }}
autoscaler:
minReplicas: {{ .Values.minReplicaCount }}
maxReplicas: {{ .Values.maxReplicaCount }}
targetCPUUtilization: 80
targetMemoryUtilization: 65
behavior:
scaleDown:
policies:
- periodSeconds: 600
type: Pods
value: 1
selectPolicy: Min
stabilizationWindowSeconds: 900
scaleUp:
policies:
- periodSeconds: 60
type: Pods
value: 2
- periodSeconds: 60
type: Percent
value: 100
selectPolicy: Max
stabilizationWindowSeconds: 60
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
resources:
limits:
cpu: 1000m
memory: 1024Mi
requests:
cpu: 50m
memory: 64Mi
Below is the generated hpa yaml
kubectl get hpa otel-gateway-collector -n monitoringapps
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
annotations:
meta.helm.sh/release-name: otel-gateway-deployment
meta.helm.sh/release-namespace: monitoringapps
creationTimestamp: "2024-09-14T06:55:59Z"
labels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/instance: monitoringapps.otel-gateway
app.kubernetes.io/managed-by: opentelemetry-operator
app.kubernetes.io/name: otel-gateway-collector
app.kubernetes.io/part-of: opentelemetry
app.kubernetes.io/version: latest
name: otel-gateway-collector
namespace: monitoringapps
ownerReferences:
- apiVersion: opentelemetry.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: OpenTelemetryCollector
name: otel-gateway
uid: 8766a8asfadadadad
resourceVersion: "549018"
uid: f06sfaffffffff
spec:
behavior:
scaleDown:
policies:
- periodSeconds: 600
type: Pods
value: 1
selectPolicy: Min
stabilizationWindowSeconds: 900
scaleUp:
policies:
- periodSeconds: 60
type: Pods
value: 2
- periodSeconds: 60
type: Percent
value: 100
selectPolicy: Max
stabilizationWindowSeconds: 60
maxReplicas: 6
metrics:
- resource:
name: memory
target:
averageUtilization: 65
type: Utilization
type: Resource
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 3
scaleTargetRef:
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
name: otel-gateway
status:
conditions:
- lastTransitionTime: "2024-09-14T06:56:14Z"
message: recommended size matches current size
reason: ReadyForNewScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2024-09-14T09:35:19Z"
message: the HPA was able to successfully calculate a replica count from memory
resource utilization (percentage of request)
reason: ValidMetricFound
status: "True"
type: ScalingActive
- lastTransitionTime: "2024-09-15T06:39:18Z"
message: the desired replica count is more than the maximum replica count
reason: TooManyReplicas
status: "True"
type: ScalingLimited
currentMetrics:
- resource:
current:
averageUtilization: 108
averageValue: 72924501333m
name: memory
type: Resource
- resource:
current:
averageUtilization: 3
averageValue: 1m
name: cpu
type: Resource
currentReplicas: 6
desiredReplicas: 6
lastScaleTime: "2024-09-15T06:39:18Z"
@jaronoff97 Could you add a test case for memory based autoscale here, so that you can reproduce it.
https://github.com/open-telemetry/opentelemetry-operator/tree/main/tests/e2e-autoscale/autoscale
@shine17 we already have a test case for CPU, i copied it for memory and was unable to reproduce your bug. Is this an issue with the operator? you mentioned deployment.yaml, where is that coming from for you? Is it possible your helm chart is misconfigured?
https://github.com/open-telemetry/opentelemetry-operator/pull/3293
if you are able to reproduce this locally, can you please provide a full working example?
Component(s)
collector
What happened?
Description
Otel collector created using otel operator not setting hpa memory utilization config correctly
Steps to Reproduce
Deploy otel operator. Create otel collector deployment object with min of 3 replicas and max of 6 replicas
The
targetMemoryUtilization
is not honored and hpa always scale the collector pods although the memory utilization is less than 30 percent of the limit for each collector pods.pod memory data -
NAME CPU(cores) MEMORY(bytes) otel-gateway-collector-7898f79fdd-27l9j 1m 55Mi
hpa data - otel-gateway-collector OpenTelemetryCollector/otel-gateway 112%/65%, 4%/80% 3 6 6 106m
Expected Result
Scaling should happen only based on
targetMemoryUtilization
percentage.Actual Result
Scaling happens since it calculates targetMemoryUtilization incorrectly.
Also please provide test cases for
targetMemoryUtilization
in the repo. I don't find test cases fortargetMemoryUtilization
in the repo https://github.com/open-telemetry/opentelemetry-operator/tree/main/tests/e2e-autoscale/autoscaleKubernetes Version
1.29.7
Operator version
0.108.0
Collector version
0.109.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
Log output
No response
Additional context
No response