kubernetes-retired / kube-aws

[EOL] A command-line tool to declaratively manage Kubernetes clusters on AWS
Apache License 2.0
1.12k stars 295 forks source link

Horizontal Pod Autoscaler (HPA): Current CPU: <unknown> #549

Closed PuKoren closed 4 years ago

PuKoren commented 7 years ago

Hello,

I try to setup an HPA (Horizontal Pod Autoscaler) on one of my deployments using the latest version of kube-aws (0.9.6-rc.2).

However it seems to fail to monitor the CPU usage, and prints the following:

kubectl get hpa
NAME          REFERENCE                TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
drone-agent   Deployment/drone-agent   <unknown> / 40%   1         6         1          25m

There is my deployment setup:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: drone-agent
  labels:
    app: drone-agent
  namespace: devtools
spec:
  template:
    metadata:
      labels:
        app: drone-agent
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: drone-agent
        image: drone/drone:0.5
        args: ["agent"]
# Removed non-relevant configuration here
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: drone-agent
  namespace: devtools
spec:
  scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: drone-agent
  minReplicas: 1
  maxReplicas: 6
  targetCPUUtilizationPercentage: 40
status:
  currentReplicas: 1
  desiredReplicas: 1

In the Dashboard, the HPA page shows: Current CPU Utilization:%

In all the monitoring pages, it seems that CPU is monitored based on the number of cores and not in %. Could this be linked?

danielfm commented 7 years ago

You need to specify resource requests in the Pod spec section of your deployment for the HPA to be able to display that information (and take any action whatsoever based on it).

Taken from the documentation:

Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.

PuKoren commented 7 years ago

Indeed it works when resources are limited, sorry about this, thanks a lot

danielfm commented 7 years ago

@PuKoren No problem!

jsilvagluky commented 7 years ago

I'm having the same problem but in a deployment with resources specified:

$ kubectl describe deployment central-test 
Name:           central-test
Namespace:      default
CreationTimestamp:  Mon, 24 Jul 2017 11:03:19 -0500
Labels:         app=central-test
Annotations:        deployment.kubernetes.io/revision=13
            kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"Deployment",...
Selector:       app=central-test
Replicas:       1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:       RollingUpdate
MinReadySeconds:    0
RollingUpdateStrategy:  1 max unavailable, 1 max surge
Pod Template:
  Labels:   app=central-test
  Containers:
   delivery-test:
    Image:  gcr.io/central-161417/central-delivery:v1.12.0-beta3
    Port:   80/TCP
    Limits:
      cpu:  500m
      memory:   1000Mi
    Requests:
      cpu:  50m
      memory:   256Mi
    Liveness:   http-get http://:80/install.php delay=10s timeout=30s period=30s #success=1 #failure=2
    Readiness:  http-get http://:80/install.php delay=10s timeout=30s period=90s #success=1 #failure=2
    Environment:
...
   delivery-test-cloudsql-proxy:
    Image:  gcr.io/cloudsql-docker/gce-proxy:1.09
    Port:   <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=central-161417:us-east1:centralmysql=tcp:3306
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:  150m
      memory:   150Mi
    Requests:
      cpu:      42m
      memory:       50Mi
    Environment:    <none>
   ...

$ kubectl describe hpa central-test 
Name:                           central-test
Namespace:                      default
Labels:                         <none>
Annotations:                        kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"central-test","namespace":"default"},"spec":{"maxR...
CreationTimestamp:                  Thu, 14 Sep 2017 17:38:30 -0500
Reference:                      Deployment/central-test
Metrics:                        ( current / target )
  resource cpu on pods  (as a percentage of request):   <unknown> / 90%
Min replicas:                       1
Max replicas:                       5
Events:                         <none>

$ cat k8s/central-test-autoscaler.yml 
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: central-test
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: central-test
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 90

I'm using Google's Kubernetes v1.7.5

redbaron commented 7 years ago

Do you see the same for HPA created for kube-dns?

jsilvagluky commented 7 years ago

I can't find any other HPA, only the ones I've created. Those were working on a previous version (1.6.?) of Kubernetes, the problem started when I upgraded the cluster to 1.7.5

$ kubectl --namespace=kube-system get hpa
No resources found.
$ kubectl --namespace=kube-public get hpa
No resources found.
$ kubectl --namespace=default get hpa    
NAME               REFERENCE                     TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
central-delivery   Deployment/central-delivery   <unknown> / 75%   2         10        0          23h
central-test       Deployment/central-test       <unknown> / 90%   1         5         0          14h
redbaron commented 7 years ago

Do you see anything in controller-manager logs? (need to find leader first).

If nothing showing up, then maybe adding --v=3 would show more

jsilvagluky commented 7 years ago

The cluster has 3 nodes, I've accessed them via ssh (gcloud compute ssh ...) and couldn't find /var/log/kube-controller-manager.log as explained on manual I only found /var/log/kube-proxy.log which, I guess, means they're all worker nodes (unless I should access the logs through journalctl but I don't know how).

However searching in cloud console logs I found the following errors periodically reported by autoscaler:

{
 insertId:  "iloy87f7rxa6x"  
 jsonPayload: {
  coresPerReplica:  256   
  min:  1   
  nodesPerReplica:  16   
 }
 labels: {
  compute.googleapis.com/resource_name:  "fluentd-gcp-v2.0-ghxmz"   
  container.googleapis.com/namespace_name:  "kube-system"   
  container.googleapis.com/pod_name:  "kube-dns-autoscaler-244676396-6lss4"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/.../logs/autoscaler"  
 receiveTimestamp:  "2017-09-13T17:09:03.800320242Z"  
 resource: {
  labels: {
   cluster_name:  "central-gke-cluster"    
   container_name:  "autoscaler"    
  ...
  }
  type:  "container"   
 }
 severity:  "ERROR"  
 timestamp:  "2017-09-13T17:08:57Z"  
}
{
 insertId:  "1bg9etde42ahd"  
 labels: {
  compute.googleapis.com/resource_name:  "gke-central-gke-cluster-default-pool-4df14db2-x048"   
  container.googleapis.com/namespace_name:  "kube-system"   
  container.googleapis.com/pod_name:  "kube-dns-autoscaler-244676396-6lss4"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/.../logs/autoscaler"  
 receiveTimestamp:  "2017-09-13T17:10:22.807859379Z"  
 resource: {
  labels: {
   cluster_name:  "central-gke-cluster"    
   container_name:  "autoscaler"    
   ...
  }
  type:  "container"   
 }
 severity:  "ERROR"  
 textPayload:  "Error while getting cluster status: Get https://10.3.240.1:443/api/v1/nodes: dial tcp 10.3.240.1:443: getsockopt: connection refused"  
 timestamp:  "2017-09-13T17:09:37Z"  
}
jsilvagluky commented 7 years ago

Well, those events are a couple days old. By removing "autoscaler" filter and viewing all logs I see this warnings repeating:

{
 insertId:  "i1iualf10u1v6"  
 labels: {
  compute.googleapis.com/resource_name:  "gke-central-gke-cluster-default-pool-4df14db2-2b65"   
  container.googleapis.com/namespace_name:  "kube-system"   
  container.googleapis.com/pod_name:  "event-exporter-1421584133-bw9h1"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/..../logs/prometheus-to-sd-exporter"  
 receiveTimestamp:  "2017-09-15T15:16:30.034728767Z"  
 resource: {
  labels: {
   cluster_name:  "central-gke-cluster"    
   container_name:  "prometheus-to-sd-exporter"    
   namespace_id:  "kube-system"    
...
  }
  type:  "container"   
 }
 severity:  "WARNING"  
 textPayload:  "Metric stackdriver_sink_request_count was not found in the cache."  
 timestamp:  "2017-09-15T15:16:27Z"  
}
{
 insertId:  "rre1srfa1tn7t"  
 labels: {
  compute.googleapis.com/resource_name:  "gke-central-gke-cluster-default-pool-4df14db2-2b65"   
  container.googleapis.com/namespace_name:  "kube-system"   
  container.googleapis.com/pod_name:  "event-exporter-1421584133-bw9h1"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/.../logs/prometheus-to-sd-exporter"  
 receiveTimestamp:  "2017-09-15T15:17:30.055465475Z"  
 resource: {
  labels: {
   cluster_name:  "central-gke-cluster"    
   container_name:  "prometheus-to-sd-exporter"    
   namespace_id:  "kube-system"    
   ...
  }
  type:  "container"   
 }
 severity:  "WARNING"  
 textPayload:  "Metric stackdriver_sink_successfully_sent_entry_count was not found in the cache."  
 timestamp:  "2017-09-15T15:17:27Z"  
}
{
 insertId:  "rre1srfa1tn7u"  
 labels: {
  compute.googleapis.com/resource_name:  "gke-central-gke-cluster-default-pool-4df14db2-2b65"   
  container.googleapis.com/namespace_name:  "kube-system"   
  container.googleapis.com/pod_name:  "event-exporter-1421584133-bw9h1"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/.../logs/prometheus-to-sd-exporter"  
 receiveTimestamp:  "2017-09-15T15:17:30.055465475Z"  
 resource: {
  labels: {
   cluster_name:  "central-gke-cluster"    
   container_name:  "prometheus-to-sd-exporter"    
   namespace_id:  "kube-system"    
   ...
  }
  type:  "container"   
 }
 severity:  "WARNING"  
 textPayload:  "Metric stackdriver_sink_received_entry_count was not found in the cache."  
 timestamp:  "2017-09-15T15:17:27Z"  
}
jsilvagluky commented 7 years ago

Last reported error is from 2017-09-13T20:42:44Z, no more errors since then.

jsilvagluky commented 7 years ago

Went to pool config in cloud console and edit it, disabled pool autoscaling, then enabled it again and now hpa reports usage again:

$ kubectl get hpa
NAME               REFERENCE                     TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
central-delivery   Deployment/central-delivery   21% / 75%   2         10        2          1d
central-test       Deployment/central-test       25% / 90%   1         5         1          2m
kishorekumark commented 6 years ago

I'm having the same issue, HPA reports unknown, I'm on AWS in a federated cluster environment. Tried on both federated cluster and individual cluster.

$ uskube describe hpa/adminconsole Name: adminconsole Namespace: default Labels: Annotations: CreationTimestamp: Sun, 05 Nov 2017 12:01:17 -0600 Reference: Deployment/adminconsole Metrics: ( current / target ) resource cpu on pods (as a percentage of request): unknown / 50% Min replicas: 1 Max replicas: 5 Events: MacBook-Pro-3:Deployments kkokkiligadda$

$ fedkube get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE adminconsole Deployment/adminconsole unknown / 20% 4 10 0 2d callmanager Deployment/callmanager unknown / 80% 1 5 0 2d confmanager Deployment/confmanager unknown / 80% 1 5 0 2d macbook-pro-3:Provisioning kkokkiligadda$

caarlos0 commented 6 years ago

FWIW, in my case, the master node was running on degraded hardware, and eventually some things were not working (including hpa).

Launched a new master and it is working now.

Only thing I was able to find in logs were in the weave logs, which showed failed connections to master. Weave pods were also restarting a lot.

narendrasolanke21 commented 6 years ago

Target shows PFB details: image

kkorada commented 6 years ago

any update on above issue ?

i am also facing same issue

mumoshu commented 6 years ago

@kkorada Hi! Unfortunately, I'm not sure what would be an "update" to this issue. From what I could see from all the discussion in this thread, all the issues has nothing to do with kube-aws itself, right? So I'm unable to figure out how I could update this issue or fix it at kube-aws level.

WDYT? Thanks.

mumoshu commented 6 years ago

Or perhaps I could help you debug your HPA issue if you could share us more details of your setup.

kkorada commented 6 years ago

PLEASE LET ME WHAT DETAILS YOU NEED

kkorada commented 6 years ago

i have installed openshift origin on aws, inventory props related to metrics:

openshift_metrics_install_metrics=true openshift_metrics_storage_kind=dynamic openshift_metrics_image_prefix=docker.io/openshift/origin- openshift_metrics_image_version=v3.9

Openshift Version: 3.9

When is describe hpa, i am getting current utilization as unknow.

mumoshu commented 6 years ago

Which version of kube-aws and Kubernetes do you use? 2018年6月8日(金) 19:45 Kalyan Sagar Korada notifications@github.com:

i have installed openshift origin on aws, inventory props related to metrics:

openshift_metrics_install_metrics=true openshift_metrics_storage_kind=dynamic openshift_metrics_image_prefix=docker.io/openshift/origin- openshift_metrics_image_version=v3.9

Openshift Version: 3.9

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-395723813, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-VptQrOlvBYexNVpAH80ug6oK2Ecks5t6lW3gaJpZM4NAW6i .

kkorada commented 6 years ago

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

mumoshu commented 6 years ago

Thx. Which kube-aws ver? 2018年6月8日(金) 20:35 Kalyan Sagar Korada notifications@github.com:

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-395734474, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-cxQ1AKDV-Be4WE6u2mFFKfltLLiks5t6mGIgaJpZM4NAW6i .

kkorada commented 6 years ago

i think i am not using kube-aws, i have installed openshift using ansible playbooks.

mumoshu commented 6 years ago

@kkorada Do you have heapster installed onto your cluster? Also, mind sharing kubectl describe deployment of your deployment, and kubectl get hpa?

miry commented 6 years ago

You need to add limits CPU and Memory.

kkorada commented 6 years ago

hey sorry for late response, yes i have added cpu and memory limits to DC. And i have resolved the issue with HPA.

HPA config is using wrong apiVersion (i have generated it with console in openshift)

elloboblanco commented 6 years ago

Hey I'm having this issue trying to use kube-aws in AWS EKS. I have memory and CPU limits specified for my container. Any HPA I try to start up gives me <unknown> CPU even though the dashboard seems to be able to read the metrics.

My HPA config


Namespace:                                             default
Labels:                                                app=graphql
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"labels":{"app":"graphql"},"name":"graphql-autoscaler","na...
CreationTimestamp:                                     Thu, 14 Jun 2018 10:56:48 -0600
Reference:                                             Deployment/graphql-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 10%
Min replicas:                                          3
Max replicas:                                          10
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                From                       Message
  ----     ------                   ----               ----                       -------
  Warning  FailedGetResourceMetric  2m (x411 over 3h)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)```
sprutner commented 6 years ago

I also have this issue with EKS. It's also been outlined by someone else here: https://serverfault.com/questions/917831/horizontalpodautoscaling-on-amazon-eks#

ivelichkovich commented 6 years ago

EKS HPA workaround https://medium.com/eks-hpa-workaround/k8s-hpa-controller-6ac2dfb4c028

medhedifour commented 6 years ago

In my case I have a diffrent problem, I don't specify the resource request in the deployment but the autoscaler add 100m for cpu request automatically. This seems also a problem case.

AdelBachene commented 5 years ago

@danielfm Thanks a lot, you saved my day

augustgerro commented 5 years ago

I got simular situation:

kubectl get hpa --all-namespaces
NAMESPACE   NAME       REFERENCE             TARGETS                          MINPODS   MAXPODS   REPLICAS   AGE
app         demo-app   Deployment/demo-app   <unknown>/200Mi, <unknown>/80%   2         10        2          65m
default     podinfo    Deployment/podinfo    <unknown>/200Mi, 100%/80%        2         10        10         77m

Description:

Events:
  Type     Reason                   Age                 From                       Message
  ----     ------                   ----                ----                       -------
  Warning  FailedGetObjectMetric    58m (x41 over 1h)   horizontal-pod-autoscaler  unable to get metric http_requests: Service on app demo-add/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
  Warning  FailedGetResourceMetric  3m (x201 over 53m)  horizontal-pod-autoscaler  missing request for cpu

API:


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   495  100   495    0     0   9533      0 --:--:-- --:--:-- --:--:--  9705
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "demo-app-65cff97f4f-jfkg2",
    "namespace": "app",
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/app/pods/demo-app-65cff97f4f-jfkg2",
    "creationTimestamp": "2018-12-11T19:22:34Z"
  },
  "timestamp": "2018-12-11T19:22:02Z",
  "window": "30s",
  "containers": [
    {
      "name": "demo-app",
      "usage": {
        "cpu": "9514204n",
        "memory": "58516Ki"
      }
    }
  ]
}```
shaojielinux commented 5 years ago

Has your problem been solved yet?

shaojielinux commented 5 years ago

我有类似的情况:

kubectl get hpa --all-namespaces
NAMESPACE   NAME       REFERENCE             TARGETS                          MINPODS   MAXPODS   REPLICAS   AGE
app         demo-app   Deployment/demo-app   <unknown>/200Mi, <unknown>/80%   2         10        2          65m
default     podinfo    Deployment/podinfo    <unknown>/200Mi, 100%/80%        2         10        10         77m

描述:

Events:
  Type     Reason                   Age                 From                       Message
  ----     ------                   ----                ----                       -------
  Warning  FailedGetObjectMetric    58m (x41 over 1h)   horizontal-pod-autoscaler  unable to get metric http_requests: Service on app demo-add/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
  Warning  FailedGetResourceMetric  3m (x201 over 53m)  horizontal-pod-autoscaler  missing request for cpu

API:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   495  100   495    0     0   9533      0 --:--:-- --:--:-- --:--:--  9705
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "demo-app-65cff97f4f-jfkg2",
    "namespace": "app",
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/app/pods/demo-app-65cff97f4f-jfkg2",
    "creationTimestamp": "2018-12-11T19:22:34Z"
  },
  "timestamp": "2018-12-11T19:22:02Z",
  "window": "30s",
  "containers": [
    {
      "name": "demo-app",
      "usage": {
        "cpu": "9514204n",
        "memory": "58516Ki"
      }
    }
  ]
}```

Has your problem been solved yet?

tobsch commented 5 years ago

I've got the same problem. Also tried turning auto scaling off and on, all containers got limits.

tobsch commented 5 years ago

In my case, it had to do with GCE permission settings on nodepool/cluster level. i think it was metrics.write.

levilugato commented 5 years ago

I've got the same problem.on AKS and Istio

pawel-furmaniak commented 5 years ago

If your deployment selector is matching the labels of containers from other deployments in your namespace, then you also need to add cpu limits to those containers for the autoscaler to work.

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

AnkurJain10 commented 5 years ago

I ran into the same issue and fixed it by adding the apiVersion to my HPA:


spec:
    scaleTargetRef:
      apiVersion: extensions/v1beta1
fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-550098545): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ysde commented 4 years ago

Hi I got the same problem. first I follow this aws link => https://aws.amazon.com/premiumsupport/knowledge-center/eks-metrics-server-pod-autoscaler/ to install hpa.

Then the target unknown problem shows.

  1. Try updating the cluster version, not solving my problem.
  2. Patch the hpa apiVersion not solving my problem. https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-529049987
  3. Found below links solved my problem.

It turned out it's the metrics server not working properly, you can check the logs of metrics server. kubectl logs -f -n kube-system -l k8s-app=metrics-server you may found a lot of unable to fetch pod metrics for pod messages there.

And you can try kubectl top nodes or kubectl top pods, both will show similary message says that cannot get metrics.

Patch the metrcis-server deployment with below args solved my problem.

kubectl patch deployment -n kube-system metrics-server -p='{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","args":["--cert-dir=/tmp", "- --secure-port=4443", "--kubelet-insecure-tls", "--kubelet-preferred-address-types=InternalIP"]}]}}}}'

image

Hope it helped, many thanks

reference: https://github.com/kubernetes-sigs/metrics-server/issues/300#issuecomment-568857398 https://github.com/kubernetes-sigs/metrics-server/issues/129 https://github.com/kubernetes-sigs/metrics-server/issues/237