kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.2k stars 1.92k forks source link

fix: use --track-unscheduled-pods to select unscheduled pods in Daemonset sharding #2388

Open CatherineF-dev opened 1 month ago

CatherineF-dev commented 1 month ago

What this PR does / why we need it:

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes https://github.com/kubernetes/kube-state-metrics/issues/2353

Tested using minikube

mrueg commented 1 month ago

Will this supersede https://github.com/kubernetes/kube-state-metrics/pull/2373 ?

CatherineF-dev commented 1 month ago

Will this supersede https://github.com/kubernetes/kube-state-metrics/pull/2373 ?

Yes.

dgrisonnet commented 1 month ago

/assign /triage accepted

mrueg commented 1 month ago

/hold for others to review. /lgtm

k8s-ci-robot commented 1 month ago

New changes are detected. LGTM label has been removed.

CatherineF-dev commented 1 month ago

/lgtm to trigger tide

diranged commented 1 month ago

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to https://github.com/kubernetes/kube-state-metrics/issues/2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

logicalhan commented 1 month ago

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to https://github.com/kubernetes/kube-state-metrics/issues/2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

How do you know the fix works without tests?

diranged commented 1 month ago

@logicalhan Can we just skip that for now? We've been waiting for 6 weeks to #2374 resolved, and we're having daily pain points with KSM right now in our environment. While I appreciate adding tests, we're literally removing code and making it simpler here.. can we bypass this and just get a fix in ASAP?

How do you know the fix works without tests?

I think this is one of those vanity test situations... I don't see it worth blocking on this just to get a code coverage line. If you have a test proposal, would you put it up? Otherwise, can we get this feature fixed since it's been broken?

logicalhan commented 1 month ago

I think this is one of those vanity test situations... I don't see it worth blocking on this just to get a code coverage line. If you have a test proposal, would you put it up? Otherwise, can we get this feature fixed since it's been broken?

This bug exists because we didn't have test coverage what the heck are you talking about. The answer is no.

logicalhan commented 1 month ago

If you want it so badly @diranged then you should write some tests. We accept pulls.

diranged commented 1 month ago

If you want it so badly @diranged then you should write some tests. We accept pulls.

Given that I'm not deeply familiar with the code - that seems like a poor idea. I wish you would reconsider here how much you want to stand on the 'needs tests' statement for such a simple PR.

mrueg commented 3 weeks ago

@CatherineF-dev do you have plans to add tests to this PR to get it into a mergable state?

CatherineF-dev commented 3 weeks ago

Okay! Work in Progress

LaikaN57 commented 2 weeks ago

@CatherineF-dev Is there anything I can do to help here? We had good momentum last week and I am looking to help continue that!

How do you feel about my comments for fixing the unit tests? (ie. adding back || len(*n) == 0 check)

I naively feel that fixing that will help out with e2e tests but I'm sure you have way more context working on this and other related code / architecture.

I am also available for video call if you need someone to use as a sounding board.

CatherineF-dev commented 2 weeks ago

I was looking at the E2E test failure https://github.com/kubernetes/kube-state-metrics/actions/runs/9417050145/job/25941445490?pr=2388

Seems something are not cleaned up after Daemonset sharding test.

CatherineF-dev commented 2 weeks ago

@LaikaN57 fixed all tests now.

LaikaN57 commented 2 weeks ago

Thanks @CatherineF-dev! I am OOO until Monday, but I did a quick review of the new code. Just 2 small nits but nothing blocking. I can review more fully on Monday if others do not approve by then. 🤞 I relly appreciate all the work you put into this!

k8s-ci-robot commented 1 week ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
CatherineF-dev commented 1 week ago

@mrueg need a review again. Rebased to HEAD.

CatherineF-dev commented 1 week ago

/unhold

CatherineF-dev commented 1 week ago

/lgtm

k8s-ci-robot commented 1 week ago

@CatherineF-dev: you cannot LGTM your own PR.

In response to [this](https://github.com/kubernetes/kube-state-metrics/pull/2388#issuecomment-2180751991): >/lgtm Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/kube-state-metrics/blob/main/OWNERS)~~ [mrueg] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
LaikaN57 commented 1 week ago

tests?

@logicalhan Tests now implemented thanks to all of @CatherineF-dev's hard work 🙏 Looks like we are looking for review again now. Thanks!