k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.94k stars 2.34k forks source link

Horizontal pod autoscaller cant work with metric server #8287

Closed dominch closed 1 year ago

dominch commented 1 year ago

Environmental Info: K3s Version:

k3s -v
k3s version v1.27.4+k3s1 (36645e73)
go version go1.20.6

Node(s) CPU architecture, OS, and Version: Mostly: Linux kv92rs 5.10.0-24-amd64 #1 SMP Debian 5.10.179-5 (2023-08-08) x86_64 GNU/Linux also bookworm and on some nodes

Cluster Configuration: 6 nodes, 3 masters

Describe the bug: Added owncloud via it's helm package, which added following hpa to cluster:

kubectl get hpa -A                                                                                                     NAMESPACE   NAME                REFERENCE                      TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
owncloud    my-cloud-owncloud   Deployment/my-cloud-owncloud   <unknown>/80%   1         3         1          8d

tagets reporting unknown cpu usage, so whole thing is not working at all, following events were recorded for this:

kubectl describe hpa my-cloud-owncloud -n owncloud                                                                     Name:                                                  my-cloud-owncloud
Namespace:                                             owncloud
Labels:                                                app.kubernetes.io/instance=my-cloud
                                                       app.kubernetes.io/managed-by=Helm
                                                       app.kubernetes.io/name=owncloud
                                                       app.kubernetes.io/version=10.12.2
                                                       helm.sh/chart=owncloud-0.5.3
Annotations:                                           meta.helm.sh/release-name: my-cloud
                                                       meta.helm.sh/release-namespace: owncloud
CreationTimestamp:                                     Wed, 23 Aug 2023 14:45:11 +0200
Reference:                                             Deployment/my-cloud-owncloud
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          1
Max replicas:                                          3
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu in container owncloud of Pod my-cloud-owncloud-5cdb678c48-8bgbm
Events:
  Type     Reason                   Age                       From                       Message
  ----     ------                   ----                      ----                       -------
  Warning  FailedGetResourceMetric  9m42s (x2 over 10m)       horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
  Warning  FailedGetResourceMetric  4m40s (x19469 over 3d9h)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu in container owncloud of Pod my-cloud-owncloud-5cdb678c48-8bgbm

Metrics server seems to work correctly and everything os ok with it also with whole prometheus stack. The problem exists only here in hpa which cannot get those needed information. I also tried to add --insecure-tls flag to check if this is an issue, but it's not.

Steps To Reproduce: Probably to this will be easily reproduced with any hpa, the one I have is from helm repo:

helm repo add owncloud https://owncloud-docker.github.io/helm-charts
helm install my-cloud owncloud/owncloud --namespace owncloud  -f values.yml

and values (probably only last section is important here)

owncloud:
  adminUsername: "admin"
  adminPassword: "qwerty"

  debug: false
  defaultLanguage: "en"
  domain: "cloud."
  trustedDomains:
    - "localhost"
    - "cloud.example.com"

persistence:
  enabled: true
  owncloud:
    size: 20Gi
  accessMode: ReadWriteMany

autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 3

Expected behavior: HPA should work, should be able to retrieve cpu usage from metric server.

Actual behavior: HPA cannot get cpu usage from metric server.

Additional context / logs: Usually problem is caused by the lack of metrics server, here it's configured and works perfectly, so there have to be something that unables to communicate from hpa to it.

brandond commented 1 year ago

Do you see cpu metrics in kubectl top pod -A ? What do the metrics-server pod logs say?

dominch commented 1 year ago

yes, metric servers seems to work correctly for all pods (-A) as well as for owncloud:

kubectl top pod -n owncloud
NAME                                 CPU(cores)   MEMORY(bytes)
my-cloud-owncloud-5cdb678c48-8bgbm   7m           193Mi

also whole prometheus works with no problem

image

dominch commented 1 year ago

Here is definition of this hpa:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    meta.helm.sh/release-name: my-cloud
    meta.helm.sh/release-namespace: owncloud
  creationTimestamp: "2023-08-23T12:45:11Z"
  labels:
    app.kubernetes.io/instance: my-cloud
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: owncloud
    app.kubernetes.io/version: 10.12.2
    helm.sh/chart: owncloud-0.5.3
  name: my-cloud-owncloud
  namespace: owncloud
  resourceVersion: "308826302"
  uid: 703af1af-6927-413f-bcab-9ca4eab12a1d
spec:
  maxReplicas: 3
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 80
        type: Utilization
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-cloud-owncloud
status:
  conditions:
  - lastTransitionTime: "2023-08-23T12:45:26Z"
    message: the HPA controller was able to get the target's current scale
    reason: SucceededGetScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2023-08-23T12:45:26Z"
    message: 'the HPA was unable to compute the replica count: failed to get cpu utilization:
      missing request for cpu in container owncloud of Pod my-cloud-owncloud-5cdb678c48-8bgbm'
    reason: FailedGetResourceMetric
    status: "False"
    type: ScalingActive
  currentMetrics: null
  currentReplicas: 1
  desiredReplicas: 0
brandond commented 1 year ago

The status would appear to suggest that you haven't set a CPU request on the pod in question? You have to actually set resource requests and limits on your pods for the HPA to work...

message: 'the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu in container owncloud of Pod my-cloud-owncloud-5cdb678c48-8bgbm'

dominch commented 1 year ago

Thanks for that tip, I briefly checked that before and helm chart suggested that some limits should be in default configuration, but it was only example. I added those to deployment and HPA successfully got his value. Thanks again for pointing that out