kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.41k stars 1.47k forks source link

Per-plugin callback functions for accurate requeueing in kube-scheduler #4247

Open sanposhiho opened 1 year ago

sanposhiho commented 1 year ago

Enhancement Description

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

sanposhiho commented 1 year ago

/sig scheduling

It was suggested we have a small KEP for QueueingHint. It's kind of a special case though, we can assume DRA is the parent KEP and this KEP stems from it. And I set the alpha version v1.26 which is the same as DRA KEP (or maybe we can just leave it as n/a), and the beta version v1.28 which we actually implemented it and enable it via the beta feature flag (enabled by default).

you are essentially splitting one KEP into two. So there is no grade-skipping as such as the grades were part of the original KEP. So we do know when the feature was alpha and when it went into beta from alpha etc. So please go ahead with a smaller KEP for SchedulerQueueingHints and use the dates from before. https://kubernetes.slack.com/archives/C5P3FE08M/p1695639140018139?thread_ts=1694167948.846139&cid=C5P3FE08M

@kubernetes/sig-scheduling-leads Can anyone give this PR lead-opted-in?

alculquicondor commented 1 year ago

Do you have a PR for this already?

sanposhiho commented 1 year ago

Not yet. Will be ready probably within this weekend.

sanposhiho commented 1 year ago

Here it is: https://github.com/kubernetes/enhancements/pull/4256

Huang-Wei commented 1 year ago

/label lead-opted-in

rayandas commented 1 year ago

Hello @sanposhiho πŸ‘‹, v1.29 Enhancements team here.

Just checking in as we approach enhancements freeze on 01:00 UTC, Friday, 6th October, 2023.

This enhancement is targeting for stage beta for v1.29 (correct me, if otherwise)

Here's where this enhancement currently stands:

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

npolshakova commented 1 year ago

Hello πŸ‘‹, 1.29 Enhancements Lead here. Unfortunately, this enhancement did not meet requirements for v1.29 enhancements freeze. Feel free to file an exception to add this back to the release tracking process. Thanks!

/milestone clear

npolshakova commented 1 year ago

Hey again πŸ‘‹ As https://github.com/kubernetes/enhancements/pull/4256 was merged by the additional time approved in the exception request, I am adding this back to v1.29 milestone and changing the status of this enhancement to tracked for enhancement freeze πŸš€

/milestone v1.29

katcosgrove commented 1 year ago

Hey there @sanposhiho πŸ‘‹, v1.29 Docs Lead here. Does this enhancement work planned for v1.29 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023. Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release. Thank you!

sanposhiho commented 1 year ago

@katcosgrove

So, I submitted the PR for modifying the doc in: https://github.com/kubernetes/website/pull/43427 (Given this KEP is in a unique situation, where it's targeting beta in v1.28 (not 1.29), we can merge the PR now, not for dev-1.29.)

@alculquicondor Do you think we need to have a dedicated page or modify some existing pages for QueueingHint? (Or it's OK not to have a doc for QueueingHint as it's internal?)

sanposhiho commented 11 months ago

I'd like to have a blog post for this enhancement. It'd be an interesting one since there hasn't been many official doc/blog talking about inside of the scheduling queue + requeueing mechanism.

alculquicondor commented 11 months ago

You could also consider a blogpost under https://www.kubernetes.dev/blog

sanposhiho commented 11 months ago

Any difference between https://kubernetes.io/blog/ and https://www.kubernetes.dev/blog by the way? Probably the former is for users, thus the posts are supposed to be understandable for those who don't know much about kubernetes internal. And, the latter is for contributors thus the posts could dive into the details of the implementation. Is this understanding correct?

alculquicondor commented 11 months ago

That is correct :)

rayandas commented 11 months ago

Hey again @sanposhiho πŸ‘‹, 1.29 Enhancements team here,

Just checking in as we approach code freeze at 01:00 UTC Wednesday 1st November 2023: .

Here's where this enhancement currently stands:

For this enhancement, it looks like the following PRs has merged and update in the Issue description:

The status of this KEP is tracked for code freeze. πŸš€

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP. As always, we are here to help if any questions come up. Thanks!

kcmartin commented 11 months ago

Hi @sanposhiho ! πŸ‘‹ from the v1.29 Release Team-Communications! We would like to check if you have any plans to publish a blog for this KEP regarding new features, removals, and deprecations for this release. It seems from the comment above that this may be the case, please confirm.

If so, you need to open a PR placeholder in the website repository. The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review)

Here's the 1.29 Calendar

sanposhiho commented 11 months ago

@kcmartin

Hi, yes, here's the placeholder PR. (empty for now) https://github.com/kubernetes/website/pull/43686

sanposhiho commented 10 months ago

(just noticed I forgot to assign it to me)

/assign

alculquicondor commented 10 months ago

@sanposhiho since the feature was disabled, please update the KEP with notes on what criteria needs to be fulfilled to re-enable the feature. I think the criteria should be roughly:

As a side note, here is the perf dashboard for memory usage https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&PodName=kube-scheduler-gce-scale-cluster-master%2Fkube-scheduler&Resource=memory. The test hasn't run since the feature was disabled, so I'm not sure if we will see a memory drop. If we do, then not seeing an increase when the feature is re-enabled would be a good signal. Otherwise, we might need to improve coverage in the load test to incorporate cases with retries.

alculquicondor commented 10 months ago

It doesn't look like there's an effect in the memory usage according to the dashboard.

salehsedghpour commented 9 months ago

/remove-label lead-opted-in

alculquicondor commented 9 months ago

@sanposhiho I believe you want to target this release?

sanposhiho commented 9 months ago

@alculquicondor Yes, let's aim at making it in this release. All blockers are tracked in: https://github.com/kubernetes/kubernetes/issues/122597

alculquicondor commented 8 months ago

@Huang-Wei could you add the lead-opted-in label?

@sanposhiho don't forget to send an update to the KEP with the target version.

salehsedghpour commented 8 months ago

As I'm closing the previous milestone, shall we add milestone v1.30?

salehsedghpour commented 8 months ago

/milestone clear

alculquicondor commented 8 months ago

yes please, add it

sanposhiho commented 8 months ago

@Huang-Wei Can you add required labels to this one too, please?

alculquicondor commented 8 months ago

https://github.com/kubernetes/enhancements/pull/4451#issuecomment-1918078118

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor commented 5 months ago

/remove-lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sanposhiho commented 2 months ago

/remove-lifecycle stale

alculquicondor commented 3 weeks ago

/label lead-opted-in

We are going for another beta iteration, re-enabling the feature gate to true by default

tjons commented 1 week ago

Hello @macsko @alculquicondor @sanposhiho πŸ‘‹, Enhancements team here.

Just checking in as we approach enhancements freeze on 02:00 UTC Friday 11th October 2024 / 19:00 PDT Thursday 10th October 2024.

This enhancement is targeting for stage beta for v1.32 (correct me, if otherwise).

Here's where this enhancement currently stands:

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!

alculquicondor commented 1 week ago

KEP status is marked as implementable for latest-milestone: v1.32

4877 already satisfies this criteria

tjons commented 1 week ago

With all the requirements met, this enhancement is now tracked for enhancements freeze! πŸš€