kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.36k stars 2k forks source link

fix(pod autosharding): transition from labelselector to fieldselector #2347

Closed pkoutsovasilis closed 3 months ago

pkoutsovasilis commented 6 months ago

What this PR does / why we need it:

This PR is a minor change for pod autosharding mode where it substitutes the LabelSelector with a FieldSelector. Specifically, at the moment in this mode, we detect and extract the labels of pod-owner StatefulSet based on which a LabelSelector is constructed and used in NewFilteredListWatchFromClient. However, if a label of the former StatefulSet is changed for any reason, e.g. an arbitrary operator manages and injects in the labels the hash of the whole statefulset to decide whether it has changed during a reconcile, as the pods won't restart (k8s design) no more events will arrive and thus shards won't be updated properly. Instead of relying on LabelSelector this PR replaces it with a FieldSelector that targets the owner StatefulSet by its name, thus it will always receive updates. This last bit, is aligned with the current if ss.Name != statefulSetName in AddFunc and UpdateFunc

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes https://github.com/kubernetes/kube-state-metrics/issues/2355

linux-foundation-easycla[bot] commented 6 months ago

CLA Signed


The committers listed above are authorized under a signed CLA.

k8s-ci-robot commented 6 months ago

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 6 months ago

Welcome @pkoutsovasilis!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

pkoutsovasilis commented 6 months ago

@dgrisonnet could you please triage this PR?

CatherineF-dev commented 5 months ago

/lgtm

  1. metadata.name is more invariant compared to labels. Though I am not sure why it used labels in the beginning.

  2. Most k/k codes are using metadata.name instead of labels. https://github.com/search?q=repo%3Akubernetes%2Fkubernetes+OneTermEqualSelector%28%22metadata.name%22&type=code&p=1

  3. Tests are done to verify that it can work.

pkoutsovasilis commented 5 months ago

@CatherineF-dev @dgrisonnet just checking, do we need something more for this PR? Could this make it in the next version of kube-state-metrics?

CatherineF-dev commented 4 months ago

cc @dgrisonnet to approve

diranged commented 4 months ago

I'm not sure what the status is here - but this is a really huge pain point for us right now. Any time we ship any release that updates labels on any deployments/statefulsets/etc, our kube-state-metrics pods get into a bad state and start sending invalid data.. which invariably leads to our ops teams getting paged with incorrect alerts about pods being in bad states.

CatherineF-dev commented 3 months ago

In asking an approval for this PR. I only have LGTM permission so far.

rexagod commented 3 months ago

/approve

@CatherineF-dev feel free to send a PR to add yourself to approvers.

Sorry, I misspoke. We can still do this, but this needs to go through other approvers internally first, and really not something for me to individually pass a judgement on.

Nonetheless, thank you for all the reviews!

k8s-ci-robot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, pkoutsovasilis, rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/kube-state-metrics/blob/main/OWNERS)~~ [rexagod] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment