Closed chris-vest closed 2 years ago
We do not have up to date documentation of the parameters (I suppose it would get out of date very quickly, bt you can run the binary with the --help option to get the flag description.
docker run -it k8s.gcr.io/autoscaling/vpa-recommender:0.9.0 ./vpa-recommender --help
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
Please could we get more complete documentation on setting up Prometheus as a history provider for the VPA recommender component?
For example how to customize it and verify that it is indeed working? Also including whether CPU and memory queries are customizable or not?
It would be nice if the documentation could include which jobs get queried e.g. is it the "kubernetes-nodes-cadvisor" and "kubernetes-pods", just the one, or are there more?
Running the previously recommended command:
docker run -it k8s.gcr.io/autoscaling/vpa-recommender:0.9.2 ./vpa-recommender --help
The descriptions of these options are too similar i.e. "Label name to look for container names"... are they all looking for container names (or is one looking for pod names)?... are they used in conjunction or either/or? :
--container-name-label string Label name to look for container names (default "name")
--container-namespace-label string Label name to look for container names (default "namespace")
--container-pod-name-label string Label name to look for container names (default "pod_name")
--pod-name-label string Label name to look for container names (default "kubernetes_pod_name")
--pod-namespace-label string Label name to look for container names (default "kubernetes_namespace")
I'm having trouble wrapping my head around when above and below options should be used:
--metric-for-pod-labels string Which metric to look for pod labels in metrics (default "up{job=\"kubernetes-pods\"}")
--pod-label-prefix string Which prefix to look for pod labels in metrics (default "pod_label_")
Would it be possible to provide examples and/or elaborate on all of the above?
Here are some instances where such docs might have helped:
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Please could we get more complete documentation on setting up Prometheus as a history provider for the VPA recommender component?
For example how to customize it and verify that it is indeed working? Also including whether CPU and memory queries are customizable or not?
It would be nice if the documentation could include which jobs get queried e.g. is it the "kubernetes-nodes-cadvisor" and "kubernetes-pods", just the one, or are there more?
Running the previously recommended command:
docker run -it k8s.gcr.io/autoscaling/vpa-recommender:0.9.2 ./vpa-recommender --help
The descriptions of these options are too similar i.e. "Label name to look for container names"... are they all looking for container names (or is one looking for pod names)?... are they used in conjunction or either/or? :
--container-name-label string Label name to look for container names (default "name") --container-namespace-label string Label name to look for container names (default "namespace") --container-pod-name-label string Label name to look for container names (default "pod_name") --pod-name-label string Label name to look for container names (default "kubernetes_pod_name") --pod-namespace-label string Label name to look for container names (default "kubernetes_namespace")
I'm having trouble wrapping my head around when above and below options should be used:
--metric-for-pod-labels string Which metric to look for pod labels in metrics (default "up{job=\"kubernetes-pods\"}") --pod-label-prefix string Which prefix to look for pod labels in metrics (default "pod_label_")
Would it be possible to provide examples and/or elaborate on all of the above?
I am / we are absurdly grateful for this project and the value it provides, having used it extensively over the last few years, but after fighting the prometheus integration setup for the first time ever for awhile last night I agree with this. It is rather obtuse trying to figure out what is going on with these options and requires a detailed analysis of the underlying codebase. Even after doing so, I wasn't successful.
It also isn't super clear what happens to the existing checkpoints when migrating storage backends and what behavior one can expect to occur in this process, which is a bit scary given that we've already littered our production environment with VPA.
My fragile understanding thus far:
--container-*-label
flags map to the labels in those metrics--metric-for-pod-labels
metric / query, i.e. container-namespace-label
== pod-namespace-label
&& container-pod-name-label
== pod-name-label
metric-for-pod-labels
series are parsed into memory (anything prefixed by pod-label-prefix
, i.e. with --pod-label-prefix=label_
, label_foo="bar"
=> foo: bar
)Is that all accurate? I tried using the kube-state-metrics kube_pod_labels
for --metric-for-pod-labels
, since our prometheus config is only scraping pod's with the scrape annotation and the default up{job="kubernetes-pods"}
is thus filtered, but something is still amiss and I was seeing lots of these before I gave up for the evening:
Error adding metric sample for container {{velero velero-6778d944c5-t5xqj} velero}: sample discarded (invalid or out of order)
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
Usage of /recommender:
--add-dir-header If true, adds the file directory to the header
--address string The address to expose Prometheus metrics. (default ":8942")
--alsologtostderr log to standard error as well as files
--checkpoints-gc-interval duration How often orphaned checkpoints should be garbage collected (default 10m0s)
--checkpoints-timeout duration Timeout for writing checkpoints since the start of the recommender's main loop (default 1m0s)
--container-name-label string Label name to look for container names (default "name")
--container-namespace-label string Label name to look for container names (default "namespace")
--container-pod-name-label string Label name to look for container names (default "pod_name")
--cpu-histogram-decay-half-life duration The amount of time it takes a historical CPU usage sample to lose half of its weight. (default 24h0m0s)
--history-length string How much time back prometheus have to be queried to get historical metrics (default "8d")
--history-resolution string Resolution at which Prometheus is queried for historical metrics (default "1h")
--kube-api-burst float QPS burst limit when making requests to Kubernetes apiserver (default 10)
--kube-api-qps float QPS limit when making requests to Kubernetes apiserver (default 5)
--log-backtrace-at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log-dir string If non-empty, write log files in this directory
--log-file string If non-empty, use this log file
--log-file-max-size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--memory-aggregation-interval duration The length of a single interval, for which the peak memory usage is computed. Memory usage peaks are aggregated in multiples of this interval. In other words there is one memory usage sample per interval (the maximum usage over that interval) (default 24h0m0s)
--memory-aggregation-interval-count int The number of consecutive memory-aggregation-intervals which make up the MemoryAggregationWindowLength which in turn is the period for memory usage aggregation by VPA. In other words, MemoryAggregationWindowLength = memory-aggregation-interval * memory-aggregation-interval-count. (default 8)
--memory-histogram-decay-half-life duration The amount of time it takes a historical memory usage sample to lose half of its weight. In other words, a fresh usage sample is twice as 'important' as one with age equal to the half life period. (default 24h0m0s)
--memory-saver If true, only track pods which have an associated VPA
--metric-for-pod-labels string Which metric to look for pod labels in metrics (default "up{job=\"kubernetes-pods\"}")
--min-checkpoints int Minimum number of checkpoints to write per recommender's main loop (default 10)
--pod-label-prefix string Which prefix to look for pod labels in metrics (default "pod_label_")
--pod-name-label string Label name to look for container names (default "kubernetes_pod_name")
--pod-namespace-label string Label name to look for container names (default "kubernetes_namespace")
--pod-recommendation-min-cpu-millicores float Minimum CPU recommendation for a pod (default 25)
--pod-recommendation-min-memory-mb float Minimum memory recommendation for a pod (default 250)
--prometheus-address string Where to reach for Prometheus metrics
--prometheus-cadvisor-job-name string Name of the prometheus job name which scrapes the cAdvisor metrics (default "kubernetes-cadvisor")
--prometheus-query-timeout string How long to wait before killing long queries (default "5m")
--recommendation-margin-fraction float Fraction of usage added as the safety margin to the recommended request (default 0.15)
--recommender-interval duration How often metrics should be fetched (default 1m0s)
--skip-headers If true, avoid header prefixes in the log messages
--skip-log-headers If true, avoid headers when opening log files
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
--storage string Specifies storage mode. Supported values: prometheus, checkpoint (default)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
--vpa-object-namespace string Namespace to search for VPA objects and pod stats. Empty means all namespaces will be used.
I think this source code can explain
Which component are you using?:
VPA recommender.
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Better user experience and easier debugging.
Describe the solution you'd like.:
Documentation for these VPA recommender configuration options:
Granted, some of these are pretty self-explanatory, but some of not obvious. For example, the
pod-label-prefix
configuration option - how is that used and do I need to configure it? I know other people might think that, because I certainly did. Users shouldn't have to dig through the code in order to understand what they do.Describe any alternative solutions you've considered.:
Little to no documentation, as it stands now - I feel like that's not an ideal scenario.