Closed PuKoren closed 4 years ago
You need to specify resource requests in the Pod spec section of your deployment for the HPA to be able to display that information (and take any action whatsoever based on it).
Taken from the documentation:
Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.
Indeed it works when resources are limited, sorry about this, thanks a lot
@PuKoren No problem!
I'm having the same problem but in a deployment with resources specified:
$ kubectl describe deployment central-test
Name: central-test
Namespace: default
CreationTimestamp: Mon, 24 Jul 2017 11:03:19 -0500
Labels: app=central-test
Annotations: deployment.kubernetes.io/revision=13
kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"Deployment",...
Selector: app=central-test
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=central-test
Containers:
delivery-test:
Image: gcr.io/central-161417/central-delivery:v1.12.0-beta3
Port: 80/TCP
Limits:
cpu: 500m
memory: 1000Mi
Requests:
cpu: 50m
memory: 256Mi
Liveness: http-get http://:80/install.php delay=10s timeout=30s period=30s #success=1 #failure=2
Readiness: http-get http://:80/install.php delay=10s timeout=30s period=90s #success=1 #failure=2
Environment:
...
delivery-test-cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.09
Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=central-161417:us-east1:centralmysql=tcp:3306
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 150m
memory: 150Mi
Requests:
cpu: 42m
memory: 50Mi
Environment: <none>
...
$ kubectl describe hpa central-test
Name: central-test
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"central-test","namespace":"default"},"spec":{"maxR...
CreationTimestamp: Thu, 14 Sep 2017 17:38:30 -0500
Reference: Deployment/central-test
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 90%
Min replicas: 1
Max replicas: 5
Events: <none>
$ cat k8s/central-test-autoscaler.yml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: central-test
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: central-test
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 90
I'm using Google's Kubernetes v1.7.5
Do you see the same for HPA created for kube-dns?
I can't find any other HPA, only the ones I've created. Those were working on a previous version (1.6.?) of Kubernetes, the problem started when I upgraded the cluster to 1.7.5
$ kubectl --namespace=kube-system get hpa
No resources found.
$ kubectl --namespace=kube-public get hpa
No resources found.
$ kubectl --namespace=default get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
central-delivery Deployment/central-delivery <unknown> / 75% 2 10 0 23h
central-test Deployment/central-test <unknown> / 90% 1 5 0 14h
Do you see anything in controller-manager logs? (need to find leader first).
If nothing showing up, then maybe adding --v=3
would show more
The cluster has 3 nodes, I've accessed them via ssh (gcloud compute ssh ...) and couldn't find /var/log/kube-controller-manager.log as explained on manual I only found /var/log/kube-proxy.log which, I guess, means they're all worker nodes (unless I should access the logs through journalctl but I don't know how).
However searching in cloud console logs I found the following errors periodically reported by autoscaler:
{
insertId: "iloy87f7rxa6x"
jsonPayload: {
coresPerReplica: 256
min: 1
nodesPerReplica: 16
}
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v2.0-ghxmz"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "kube-dns-autoscaler-244676396-6lss4"
container.googleapis.com/stream: "stderr"
}
logName: "projects/.../logs/autoscaler"
receiveTimestamp: "2017-09-13T17:09:03.800320242Z"
resource: {
labels: {
cluster_name: "central-gke-cluster"
container_name: "autoscaler"
...
}
type: "container"
}
severity: "ERROR"
timestamp: "2017-09-13T17:08:57Z"
}
{
insertId: "1bg9etde42ahd"
labels: {
compute.googleapis.com/resource_name: "gke-central-gke-cluster-default-pool-4df14db2-x048"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "kube-dns-autoscaler-244676396-6lss4"
container.googleapis.com/stream: "stderr"
}
logName: "projects/.../logs/autoscaler"
receiveTimestamp: "2017-09-13T17:10:22.807859379Z"
resource: {
labels: {
cluster_name: "central-gke-cluster"
container_name: "autoscaler"
...
}
type: "container"
}
severity: "ERROR"
textPayload: "Error while getting cluster status: Get https://10.3.240.1:443/api/v1/nodes: dial tcp 10.3.240.1:443: getsockopt: connection refused"
timestamp: "2017-09-13T17:09:37Z"
}
Well, those events are a couple days old. By removing "autoscaler" filter and viewing all logs I see this warnings repeating:
{
insertId: "i1iualf10u1v6"
labels: {
compute.googleapis.com/resource_name: "gke-central-gke-cluster-default-pool-4df14db2-2b65"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "event-exporter-1421584133-bw9h1"
container.googleapis.com/stream: "stderr"
}
logName: "projects/..../logs/prometheus-to-sd-exporter"
receiveTimestamp: "2017-09-15T15:16:30.034728767Z"
resource: {
labels: {
cluster_name: "central-gke-cluster"
container_name: "prometheus-to-sd-exporter"
namespace_id: "kube-system"
...
}
type: "container"
}
severity: "WARNING"
textPayload: "Metric stackdriver_sink_request_count was not found in the cache."
timestamp: "2017-09-15T15:16:27Z"
}
{
insertId: "rre1srfa1tn7t"
labels: {
compute.googleapis.com/resource_name: "gke-central-gke-cluster-default-pool-4df14db2-2b65"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "event-exporter-1421584133-bw9h1"
container.googleapis.com/stream: "stderr"
}
logName: "projects/.../logs/prometheus-to-sd-exporter"
receiveTimestamp: "2017-09-15T15:17:30.055465475Z"
resource: {
labels: {
cluster_name: "central-gke-cluster"
container_name: "prometheus-to-sd-exporter"
namespace_id: "kube-system"
...
}
type: "container"
}
severity: "WARNING"
textPayload: "Metric stackdriver_sink_successfully_sent_entry_count was not found in the cache."
timestamp: "2017-09-15T15:17:27Z"
}
{
insertId: "rre1srfa1tn7u"
labels: {
compute.googleapis.com/resource_name: "gke-central-gke-cluster-default-pool-4df14db2-2b65"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "event-exporter-1421584133-bw9h1"
container.googleapis.com/stream: "stderr"
}
logName: "projects/.../logs/prometheus-to-sd-exporter"
receiveTimestamp: "2017-09-15T15:17:30.055465475Z"
resource: {
labels: {
cluster_name: "central-gke-cluster"
container_name: "prometheus-to-sd-exporter"
namespace_id: "kube-system"
...
}
type: "container"
}
severity: "WARNING"
textPayload: "Metric stackdriver_sink_received_entry_count was not found in the cache."
timestamp: "2017-09-15T15:17:27Z"
}
Last reported error is from 2017-09-13T20:42:44Z, no more errors since then.
Went to pool config in cloud console and edit it, disabled pool autoscaling, then enabled it again and now hpa reports usage again:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
central-delivery Deployment/central-delivery 21% / 75% 2 10 2 1d
central-test Deployment/central-test 25% / 90% 1 5 1 2m
I'm having the same issue, HPA reports unknown, I'm on AWS in a federated cluster environment. Tried on both federated cluster and individual cluster.
$ uskube describe hpa/adminconsole
Name: adminconsole
Namespace: default
Labels:
$ fedkube get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE adminconsole Deployment/adminconsole unknown / 20% 4 10 0 2d callmanager Deployment/callmanager unknown / 80% 1 5 0 2d confmanager Deployment/confmanager unknown / 80% 1 5 0 2d macbook-pro-3:Provisioning kkokkiligadda$
FWIW, in my case, the master node was running on degraded hardware, and eventually some things were not working (including hpa).
Launched a new master and it is working now.
Only thing I was able to find in logs were in the weave logs, which showed failed connections to master. Weave pods were also restarting a lot.
Target shows
any update on above issue ?
i am also facing same issue
@kkorada Hi! Unfortunately, I'm not sure what would be an "update" to this issue. From what I could see from all the discussion in this thread, all the issues has nothing to do with kube-aws itself, right? So I'm unable to figure out how I could update this issue or fix it at kube-aws level.
WDYT? Thanks.
Or perhaps I could help you debug your HPA issue if you could share us more details of your setup.
PLEASE LET ME WHAT DETAILS YOU NEED
i have installed openshift origin on aws, inventory props related to metrics:
openshift_metrics_install_metrics=true openshift_metrics_storage_kind=dynamic openshift_metrics_image_prefix=docker.io/openshift/origin- openshift_metrics_image_version=v3.9
Openshift Version: 3.9
When is describe hpa, i am getting current utilization as unknow.
Which version of kube-aws and Kubernetes do you use? 2018年6月8日(金) 19:45 Kalyan Sagar Korada notifications@github.com:
i have installed openshift origin on aws, inventory props related to metrics:
openshift_metrics_install_metrics=true openshift_metrics_storage_kind=dynamic openshift_metrics_image_prefix=docker.io/openshift/origin- openshift_metrics_image_version=v3.9
Openshift Version: 3.9
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-395723813, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-VptQrOlvBYexNVpAH80ug6oK2Ecks5t6lW3gaJpZM4NAW6i .
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Thx. Which kube-aws ver? 2018年6月8日(金) 20:35 Kalyan Sagar Korada notifications@github.com:
Client Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"", Minor:"", GitVersion:"v1.9.1+a0ce1bc657", GitCommit:"a0ce1bc", GitTreeState:"clean", BuildDate:"2018-04-11T20:47:54Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/kubernetes-incubator/kube-aws/issues/549#issuecomment-395734474, or mute the thread https://github.com/notifications/unsubscribe-auth/AABV-cxQ1AKDV-Be4WE6u2mFFKfltLLiks5t6mGIgaJpZM4NAW6i .
i think i am not using kube-aws, i have installed openshift using ansible playbooks.
@kkorada Do you have heapster installed onto your cluster? Also, mind sharing kubectl describe deployment
of your deployment, and kubectl get hpa
?
You need to add limits CPU and Memory.
hey sorry for late response, yes i have added cpu and memory limits to DC. And i have resolved the issue with HPA.
HPA config is using wrong apiVersion (i have generated it with console in openshift)
Hey I'm having this issue trying to use kube-aws in AWS EKS. I have memory and CPU limits specified for my container. Any HPA I try to start up gives me <unknown>
CPU even though the dashboard seems to be able to read the metrics.
My HPA config
Namespace: default
Labels: app=graphql
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"labels":{"app":"graphql"},"name":"graphql-autoscaler","na...
CreationTimestamp: Thu, 14 Jun 2018 10:56:48 -0600
Reference: Deployment/graphql-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 10%
Min replicas: 3
Max replicas: 10
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m (x411 over 3h) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)```
I also have this issue with EKS. It's also been outlined by someone else here: https://serverfault.com/questions/917831/horizontalpodautoscaling-on-amazon-eks#
EKS HPA workaround https://medium.com/eks-hpa-workaround/k8s-hpa-controller-6ac2dfb4c028
In my case I have a diffrent problem, I don't specify the resource request in the deployment but the autoscaler add 100m for cpu request automatically. This seems also a problem case.
@danielfm Thanks a lot, you saved my day
I got simular situation:
kubectl get hpa --all-namespaces
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app demo-app Deployment/demo-app <unknown>/200Mi, <unknown>/80% 2 10 2 65m
default podinfo Deployment/podinfo <unknown>/200Mi, 100%/80% 2 10 10 77m
Description:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetObjectMetric 58m (x41 over 1h) horizontal-pod-autoscaler unable to get metric http_requests: Service on app demo-add/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
Warning FailedGetResourceMetric 3m (x201 over 53m) horizontal-pod-autoscaler missing request for cpu
API:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 495 100 495 0 0 9533 0 --:--:-- --:--:-- --:--:-- 9705
{
"kind": "PodMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "demo-app-65cff97f4f-jfkg2",
"namespace": "app",
"selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/app/pods/demo-app-65cff97f4f-jfkg2",
"creationTimestamp": "2018-12-11T19:22:34Z"
},
"timestamp": "2018-12-11T19:22:02Z",
"window": "30s",
"containers": [
{
"name": "demo-app",
"usage": {
"cpu": "9514204n",
"memory": "58516Ki"
}
}
]
}```
Has your problem been solved yet?
我有类似的情况:
kubectl get hpa --all-namespaces NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE app demo-app Deployment/demo-app <unknown>/200Mi, <unknown>/80% 2 10 2 65m default podinfo Deployment/podinfo <unknown>/200Mi, 100%/80% 2 10 10 77m
描述:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedGetObjectMetric 58m (x41 over 1h) horizontal-pod-autoscaler unable to get metric http_requests: Service on app demo-add/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered Warning FailedGetResourceMetric 3m (x201 over 53m) horizontal-pod-autoscaler missing request for cpu
API:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 495 100 495 0 0 9533 0 --:--:-- --:--:-- --:--:-- 9705 { "kind": "PodMetrics", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "name": "demo-app-65cff97f4f-jfkg2", "namespace": "app", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/app/pods/demo-app-65cff97f4f-jfkg2", "creationTimestamp": "2018-12-11T19:22:34Z" }, "timestamp": "2018-12-11T19:22:02Z", "window": "30s", "containers": [ { "name": "demo-app", "usage": { "cpu": "9514204n", "memory": "58516Ki" } } ] }```
Has your problem been solved yet?
I've got the same problem. Also tried turning auto scaling off and on, all containers got limits.
In my case, it had to do with GCE permission settings on nodepool/cluster level. i think it was metrics.write.
I've got the same problem.on AKS and Istio
If your deployment selector
is matching the labels
of containers from other deployments in your namespace, then you also need to add cpu limits to those containers for the autoscaler to work.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
I ran into the same issue and fixed it by adding the apiVersion to my HPA:
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
Hi I got the same problem. first I follow this aws link => https://aws.amazon.com/premiumsupport/knowledge-center/eks-metrics-server-pod-autoscaler/ to install hpa.
Then the target unknown problem shows.
It turned out it's the metrics server not working properly, you can check the logs of metrics server.
kubectl logs -f -n kube-system -l k8s-app=metrics-server
you may found a lot of unable to fetch pod metrics for pod
messages there.
And you can try kubectl top nodes
or kubectl top pods
, both will show similary message says that cannot get metrics.
Patch the metrcis-server deployment with below args solved my problem.
kubectl patch deployment -n kube-system metrics-server -p='{"spec":{"template":{"spec":{"containers":[{"name":"metrics-server","args":["--cert-dir=/tmp", "- --secure-port=4443", "--kubelet-insecure-tls", "--kubelet-preferred-address-types=InternalIP"]}]}}}}'
Hope it helped, many thanks
reference: https://github.com/kubernetes-sigs/metrics-server/issues/300#issuecomment-568857398 https://github.com/kubernetes-sigs/metrics-server/issues/129 https://github.com/kubernetes-sigs/metrics-server/issues/237
Hello,
I try to setup an HPA (Horizontal Pod Autoscaler) on one of my deployments using the latest version of kube-aws (
0.9.6-rc.2
).However it seems to fail to monitor the CPU usage, and prints the following:
There is my deployment setup:
In the Dashboard, the HPA page shows:
Current CPU Utilization:%
In all the monitoring pages, it seems that CPU is monitored based on the number of cores and not in %. Could this be linked?