Closed lallydd closed 1 year ago
Can you come to SIG meeting to talk about this?
Hi! From the SIG meeting, this is my recollection (please correct me when I've remembered incorrectly) of the discussion:
I propose using the metrics.v1beta1.PodMetricsGetter
and its included PodMetricsInterface
as the interface:
// PodMetricsesGetter has a method to return a PodMetricsInterface.
// A group's client should implement this interface.
type PodMetricsesGetter interface {
PodMetricses(namespace string) PodMetricsInterface
}
// PodMetricsInterface has methods to work with PodMetrics resources.
type PodMetricsInterface interface {
Get(ctx context.Context, name string, opts v1.GetOptions) (*v1beta1.PodMetrics, error)
List(ctx context.Context, opts v1.ListOptions) (*v1beta1.PodMetricsList, error)
Watch(ctx context.Context, opts v1.ListOptions) (watch.Interface, error)
PodMetricsExpansion
}
It's installed as the resourceClient.PodMetricsGetter
arg in the call to input_metrics.NewMetricsClient()
in recommender/main.go
.
RE: the automated tests, I've used a recorded result from the Datadog API as an offline test.
Hi, I took a look. I think I need to look some more. I'm still thinking / asking around. I'm not very confident in what I think but I thought I should let you know:
PodMetricsesGetter
:
a. We don't control the interface. If we need to update the dependency and the interface changes we'll need to update all metric providers
b. It might be bigger than we need, MetricsClient might be all we need (I still need to check if it's all we need),
I talked about this with @x13n
Daniel pointed out that rather that setting up support for DataDog specifically we could instead add support for external & custom metrics to VPA.
I like the idea:
What do you think about adding support for custom / external metrics to VPA as a way to support using Datadog metrics?
So, roughly, does this mean using this interface https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/metrics/interfaces.go#L40-L57 inside the recommender
?
I see the appeal, and it seeems reasonable. My only concern is the user configuration for the metric. Generally we only want the measurement of each resource that corresponds to its recommendation (e.g., RSS for RAM, top(1)
CPU usage for CPU). Is there some way to avoid or strongly discourage misconfiguration?
Or would you configure the recommender
to just use the External Metrics Provider and there would be some automagic (e.g., standardized names that each provider would map to their own internal names) metric name that did the right thing?
Yes, VPA recommender would use MetricsClient
interface to get metrics.
I'm not sure I understand your question about misconfiguration.
Are you considering which of two approaches:
?
For configuration, different measurement systems (e.g., metrics-server
, Datadog, others) will likely have different names for container CPU / Memory usage. So we'll have to configure that somehow.
As for the two approaches, I think (2) is likely necessary - we shouldn't require each VerticalPodAutoscaler
to have more configuration when it's all the same for every instance in a cluster. As for (1), I think that's a separate and longer discussion.
Clarifying: when calling
GetExternalMetric(metricName string, namespace string, selector labels.Selector)
The metricName
arg needs a value, and we have to call it twice, once for CPU metrics, once for RAM.
So, if the recommender
took three more command-line args:
--use-external-metrics
--external-metrics-cpu-metric=kubernetes.cpu.usage.total
--external-metrics-memory-metric=kubernetes.memory.usage
This could configure the recommender
to both call the external metrics provider and use the right values for the metricName
parameter of GetExternalMetric
. I used the appropriate Datadog metric names here as an example.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
On Sun, May 7, 2023, 2:50 PM Kubernetes Triage Robot < @.***> wrote:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity, lifecycle/stale is applied
- After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
- After 30d of inactivity since lifecycle/rotten was applied, the issue is closed
You can:
- Mark this issue as fresh with /remove-lifecycle stale
- Close this issue with /close
- Offer to help out with Issue Triage https://www.kubernetes.dev/docs/guide/issue-triage/
Please send feedback to sig-contributor-experience at kubernetes/community https://github.com/kubernetes/community.
/lifecycle stale
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/autoscaler/issues/5153#issuecomment-1537515753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOK7D5LP577BNSWD7LWULNTXE7VGLANCNFSM6AAAAAAQBTD7EQ . You are receiving this because you authored the thread.Message ID: @.***>
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/remove-lifecycle rotten
Which component are you using?:
vertical-pod-autoscaler/pkg/recommender
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: We collect our own metrics separately.
metrics-server
is redundant.Describe the solution you'd like.: Enable some way:
To have the recommender use a separate metrics source than
metrics-server
.Describe any alternative solutions you've considered.: Nothing good comes to mind. We have a custom fork that's almost all redundant code with the stock
recommender
.Additional context.: In our case, Datadog already collects metrics for all of our Kubernetes clusters, and has a simple API to get the metric values. Feeding metrics from the API to the
recommender
is substantially easier than maintaining a deployment tometrics-server
.