feat(metrics): Add node_group labels to cluster_autoscaler metrics to get more detailed information

dev-slatto commented 1 year ago

Which component are you using?: Autoscaler and Metrics

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.: As of today the metrics exposed in the Autoscaler allows one to see if the entire cluster is at max capacity compared to how many nodes that it have available in the node group / autoscaler.

Example: sum(cluster_autoscaler_nodes_count{}) by (app_kubernetes_io_name) / avg(cluster_autoscaler_max_nodes_count{}) by (app_kubernetes_io_name)

Say that I have a node group that can scale up to 8 nodes, with an min instance state of 2 nodes. I then want to get an alert with my metric system (eg. Prometheus) if this node group is at 8 groups meaning that I'm not able to scale this spesific group any further. This will allow me to take pro active actions on dedicated node groups.

Describe the solution you'd like.: It would be nice if there was some lables available with say the cluster_autoscaler_nodes_count metric that includes something that can be used to identify a node group, eg. the node_group label. Then you can filter and group by this lable, resolving the feature described above.

Describe any alternative solutions you've considered.: I can use some of the metrics that the cloud provider have, but this requires me to have all nodes regisered with the cloud provider and keep alerts for metrics in two different systems.

Additional context.: Slack thread: https://kubernetes.slack.com/archives/C09R1LV8S/p1686659016161629

elmiko commented 1 year ago

imo, this feature will need to be gated behind a command line flag. perhaps we can reuse --emit-per-nodegroup-metrics?

the rationale here is that large clusters can have high numbers of node groups, this mean that the cardinality of metrics will expand exponentially. we need to be careful about causing such a large growth in metrics emitted without a way for users to disable the data.

PavelGloba commented 1 year ago

I thought that --emit-per-nodegroup-metrics flag should do exactly that it says: emit per nodegroup metrics. Not just add two additional metrics cluster_autoscaler_node_group_max_count and cluster_autoscaler_node_group_min_count. Currently it's pretty much useless and imo doesn't work as intended.

elmiko commented 1 year ago

@PavelGloba i tend to agree about that flag doing more, i would love for us to have the ability to get the type of metrics labels we are talking. imo, we should do it =)

vadasambar commented 1 year ago

@PavelGloba if you have any suggestions, maybe we can have a separate issue.

TrungBui59 commented 1 year ago

Hi I am a new contributor. @vadasambar show me this issue, and I am interested in take over it. Is this a good place to start since I dont have a lot of experience with open source project and Kubernetes ? Thank you guys very much

Bryce-Soghigian commented 1 year ago

I do feel the cardinality of node_groups were designed this way on purpose. Nodegroups can potentially become high cardinality for clusters at the GKE 15k node limit or the AKS 5k Node limit. Lets think carefully about the memory implications and cardinality implications before adding these metrics, and write a proper design if we do go this route. Given node groups aren't always high cardinality, but when scaling timeseries data I do prefer a leaner approach.

elmiko commented 1 year ago

@Bryce-Soghigian agreed, imo we need to add this behind the flag we already have available so that users will need to explicitly enable it. in this manner, nothing will change for people who aren't using the flag now and there will be no additional metrics data. but, for people who want the extra data they could enable the flag.

@TrungBui59 this may be a good issue to work on but it's going to require some knowledge of the autoscaler and prometheus, and also the ability to set it up and run the autoscaler to confirm the results.

here are some of my thoughts on one possible solution.

for the metrics that we want to add node group labels, we should do the following:

add metrics configurations for the timeseries we want to have node group ids (these will need to shadow the original metrics, we won't be able to use both at the same time)
modify the metrics collection functions to be aware of the configuration option for the --emit-per-nodegroup-metrics flag
modify the metrics collection functions to accept the node group id as a parameter
update all calls to the collection functions to pass the new parameter
modify the metrics collection functions to use the new configurations when emitting per node group metrics and otherwise use the original configurations

this might be a little challenging as we currently do not pass the configuration option around, nor are those metrics collection functions aware of anything outside the data they are collecting.

to add complication, we currently use the --emit-per-nodegroup-metrics flag as a binary switch to add more collection functions. we will most likely need to follow this pattern with the addition that we will need to switch on the metrics configurations we add depending on whether the flag is set or not.

so, imo, we need to update the collection functions so that they can take the node group id but then can choose which metrics configuration to update depending on the state of the emit flag. this might require a refactoring of the metrics module to make the configuration option easier to pass around, i'm not sure.

i think another thing we should do is to determine which metrics should have the node group id as a label, i don't think it's appropriate for all the metrics.

TrungBui59 commented 1 year ago

@Bryce-Soghigian agreed, imo we need to add this behind the flag we already have available so that users will need to explicitly enable it. in this manner, nothing will change for people who aren't using the flag now and there will be no additional metrics data. but, for people who want the extra data they could enable the flag.

@TrungBui59 this may be a good issue to work on but it's going to require some knowledge of the autoscaler and prometheus, and also the ability to set it up and run the autoscaler to confirm the results.

here are some of my thoughts on one possible solution.

for the metrics that we want to add node group labels, we should do the following:

add metrics configurations for the timeseries we want to have node group ids (these will need to shadow the original metrics, we won't be able to use both at the same time)

modify the metrics collection functions to be aware of the configuration option for the --emit-per-nodegroup-metrics flag

modify the metrics collection functions to accept the node group id as a parameter

update all calls to the collection functions to pass the new parameter

modify the metrics collection functions to use the new configurations when emitting per node group metrics and otherwise use the original configurations

this might be a little challenging as we currently do not pass the configuration option around, nor are those metrics collection functions aware of anything outside the data they are collecting.

to add complication, we currently use the --emit-per-nodegroup-metrics flag as a binary switch to add more collection functions. we will most likely need to follow this pattern with the addition that we will need to switch on the metrics configurations we add depending on whether the flag is set or not.

so, imo, we need to update the collection functions so that they can take the node group id but then can choose which metrics configuration to update depending on the state of the emit flag. this might require a refactoring of the metrics module to make the configuration option easier to pass around, i'm not sure.

i think another thing we should do is to determine which metrics should have the node group id as a label, i don't think it's appropriate for all the metrics.

As this is my first time contribute to Kubernetes, do you think I should start with this issue? Thank you

elmiko commented 1 year ago

As this is my first time contribute to Kubernetes, do you think I should start with this issue? Thank you

i think this is a challenging issue to take on as a first issue by yourself. it might be a nice thing to pair on with someone else who might be able to set the foundational patterns. once we have the basic building block for how this will be done, it should be relatively straight forward to make the changes to all the metric collection function. i think the initial design work might be difficult to take on if you are not familiar with this code base.

if you feel comfortable setting up the cluster-autoscaler and investigating the metrics output, then it might be a good issue to take on. it depends on how familiar with kubernetes you are, and how you feel about doing this type of setup and testing.

TrungBui59 commented 1 year ago

As this is my first time contribute to Kubernetes, do you think I should start with this issue? Thank you

i think this is a challenging issue to take on as a first issue by yourself. it might be a nice thing to pair on with someone else who might be able to set the foundational patterns. once we have the basic building block for how this will be done, it should be relatively straight forward to make the changes to all the metric collection function. i think the initial design work might be difficult to take on if you are not familiar with this code base.

if you feel comfortable setting up the cluster-autoscaler and investigating the metrics output, then it might be a good issue to take on. it depends on how familiar with kubernetes you are, and how you feel about doing this type of setup and testing.

@vadasambar can you help me with setup my machine? I really want to learn so with this issues, I think I can gain a great understanding of Kubernetes

MaciekPytel commented 1 year ago

+1 to @Bryce-Soghigian, this is exactly why we haven't done it in the past. It also means we probably shouldn't change the default behavior as it could easily break people running either very large clusters or large number of clusters (use-case here is a single monitoring panel for, let's say, 100 clusters).

I see no reason why we couldn't add per-nodegroup metrics as an opt-in behind a flag, but I'd consider adding those as new metrics with different names rather than conditionally adding labels to existing metrics (so - add "nodes_count_per_nodegroup" instead of adding "nodegroup" label to "nodes_count"). The reason is simply to minimize confusion and avoid problems when someone tries to join metrics from two different clusters and don't realize they're using different set of labels.

elmiko commented 1 year ago

but I'd consider adding those as new metrics with different names rather than conditionally adding labels to existing metrics (so - add "nodes_count_per_nodegroup" instead of adding "nodegroup" label to "nodes_count").

i think that makes some sense. it will make the implementation much easier, but will make the collection slightly more complex as users will need to setup another set of watches for the new metrics.

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

elmiko commented 9 months ago

i think this is a nice feature to have, but just needs someone to propose the changes. i'm happy to help someone understand this issue if they'd like to work on it.

/remove-lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

elmiko commented 3 months ago

/remove-lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

elmiko commented 3 weeks ago

i am still interested in this feature, happy to work with anyone who wants to implement it.

/remove-lifecycle stale

kubernetes / autoscaler

feat(metrics): Add node_group labels to cluster_autoscaler metrics to get more detailed information #5850