kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.82k stars 3.87k forks source link

Timing issue when loading history from AWS AMP #6050

Open Evedel opened 10 months ago

Evedel commented 10 months ago

Which component are you using?: Vertical Pod Autoscaler (Recommender only)

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: In my case, VPA runs with AWS AMP as a history provider. Pods also use iam role-based permissions. That means that the VPA recommender deployment consist of two containers:

  1. aws-sigv4-proxy (repo, aws docs) with:
      - --host
      - aps-workspaces.${REGION}.amazonaws.com
      - --port
      - :8005
  2. VPA recommender with
       - --storage=prometheus
       - --prometheus-address=http://localhost:8005/workspaces/${WORKSPACE_ID}

    It works when proxy starts faster then recommender, no error in logs at all on any verbosity level and memory consumption is ~2Gi.

However, there is some timing issue. Approximately half of the time the recommender starts faster than the proxy, try to load history, fails on the first query with a connection error, and exists from loading history at all. Memory consumption is then ~50Mi. Recommendations are still given within approximately the same ranges.

I can see the error in logs, saying that:

Cannot get cluster history: cannot get usage history: cannot get timeseries for cpu: Post "http://localhost:8005/workspaces/${WORKSPACE_ID}/api/v1/query_range": dial tcp localhost:80: connect: connection refused

The error is from this "stacktrace": https://github.com/kubernetes/autoscaler/blob/e1b03fac9958791790bfc18eeba9fab5cac0ccc1/vertical-pod-autoscaler/pkg/recommender/main.go#L188

https://github.com/kubernetes/autoscaler/blob/e1b03fac9958791790bfc18eeba9fab5cac0ccc1/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go#L199

https://github.com/kubernetes/autoscaler/blob/e1b03fac9958791790bfc18eeba9fab5cac0ccc1/vertical-pod-autoscaler/pkg/recommender/input/history/history_provider.go#L216

Describe the solution you'd like.: As the recommender works fine without historical data, I would ask if it is possible to add an argument to skip the history initialisation explicitly.

An alternative/opposite solution might be to strictly require Prometheus history initialisation .

Please let me know what you think. Also, I would be keen to implement/contribute the solution.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Shubham82 commented 4 months ago

/remove-lifecycle rotten /triage accepted