zalando-incubator / kube-metrics-adapter

General purpose metrics adapter for Kubernetes HPA metrics
MIT License
528 stars 112 forks source link

Metrics server seems to fail to pull metrics on eks #178

Open prageethw opened 4 years ago

prageethw commented 4 years ago

Expected Behavior

Metrics should be pulled .

Actual Behavior

  "istio-requests-error-rate" on Pod/go-demo-7-app (target value):        <unknown>/ 100m
  "istio-requests-max-resp-time" on Pod/go-demo-7-app (target value):      <unknown> / 500m
  "istio-requests-average-resp-time" on Pod/go-demo-7-app (target value):  <unknown> / 250m
  "istio-requests-per-replica" on Pod/go-demo-7-app (target value):        <unknown> / 5

Steps to Reproduce the Problem

 annotations:
    metric-config.object.istio-requests-error-rate.prometheus/query: |
      (sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*",
               destination_workload_namespace="go-demo-7", reporter="destination",response_code=~"5.*"}[5m])) 
      / 
      sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*", 
               destination_workload_namespace="go-demo-7",reporter="destination"}[5m]))) > 0 or on() vector(0)
    metric-config.object.istio-requests-per-replica.prometheus/query: |
      sum(rate(istio_requests_total{destination_workload=~"go-demo-7-app.*",destination_workload_namespace="go-demo-7",
                reporter="destination"}[5m])) 
      /
      count(count(container_memory_usage_bytes{namespace="go-demo-7",pod=~"go-demo-7-app.*"}) by (pod))
    metric-config.object.istio-requests-average-resp-time.prometheus/query: | 
      (sum(rate(istio_request_duration_milliseconds_sum{destination_workload=~"go-demo-7-app.*", reporter="destination"}[5m])) 
      / 
      sum(rate(istio_request_duration_milliseconds_count{destination_workload=~"go-demo-7-app.*", reporter="destination"}[5m])))/1000 > 0 or on() vector(0)
    metric-config.object.istio-requests-max-resp-time.prometheus/query: |
      histogram_quantile(0.95, 
                  sum(irate(istio_request_duration_milliseconds_bucket{destination_workload=~"go-demo-7-app.*"}[1m])) by (le))/1000 > 0  or on() vector(0)

Specifications

logs shows...

*1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch v1.ConfigMap: unknown (get configmaps) kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:16.970700 1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch v1.ConfigMap: unknown (get configmaps) kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:17.972675 1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch v1.ConfigMap: unknown (get configmaps) kube-metrics-adapter-7b79498f9-7b8rt kube-metrics-adapter E0717 03:49:17.973213**

works fine with,
**--set image.repository=registry.opensource.zalan.do/teapot/kube-metrics-adapter \
--set image.tag=v0.1.0**
szuecs commented 4 years ago

@prageethw this looks for me like a Kubernetes vs client-go version issue. Can you check if a cluster with version >= 1.17 works?

prageethw commented 4 years ago

@szuecs tried with k8s 1.17.X still fails

known (get configmaps)
kube-metrics-adapter-7b79498f9-g42j8 kube-metrics-adapter E0717 09:33:22.393669       1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-g42j8 kube-metrics-adapter E0717 09:33:22.394841       1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-7bwgs kube-metrics-adapter E0717 09:33:23.017651       1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kube-metrics-adapter-7b79498f9-7bwgs kube-metrics-adapter E0717 09:33:23.020493       1 reflector.go:307] pkg/mod/k8s.io/client-go@v0.17.3/tools/cache/reflector.go:105: Failed to watch *v1.ConfigMap: unknown (get configmaps)
kub
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.6-eks-4e7f64", GitCommit:"4e7f642f9f4cbb3c39a4fc6ee84fe341a8ade94c", GitTreeState:"clean", BuildDate:"2020-06-11T13:55:35Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
mikkeloscar commented 4 years ago

@prageethw I think it could be related to: https://github.com/zalando-incubator/kube-metrics-adapter/issues/142 Seems like a default RBAC rule is no longer there by default in some setups. Try the suggested steps in that issue.

prageethw commented 4 years ago

@mikkeloscar I had a look in the helm chart(banzai), it seems it already exists, but still, I see the error in the logs, but It seems the metrics are pulled out successfully though, it just annoying defect it seems :) https://github.com/banzaicloud/kube-metrics-adapter/blob/master/deploy/charts/kube-metrics-adapter/templates/rbac.yaml#L42

prageethw commented 4 years ago

@mikkeloscar yeah you are right it was not in collector ClusterRole though. I just added it and sent a pull to helm.

prageethw commented 4 years ago

fix https://github.com/zalando-incubator/kube-metrics-adapter/pull/181 will fix this issue once it is merged.

pedrojimenez commented 4 years ago

Also tested the Fix with image v0.1.5 and worked perfectly:

time="2020-08-04T11:47:36Z" level=info msg="Found 1 new/updated HPA(s)" provider=hpa
time="2020-08-04T11:47:36Z" level=info msg="Collected 1 new metric(s)" provider=hpa

Thanks for the fix ;)