[Coscheduling] make podGroup a queueing unit

kerthcet commented 1 year ago

Area

[X] Scheduler
[ ] Controller
[ ] Helm Chart
[ ] Documents

Other components

No response

What happened?

Currently, we use queuedPodInfo's InitialAttemptTimestamp in ordering, it's a static value which will lead to starvation, earlier submitted podGroups will always block the queue if unschedulable(backoff queue can somehow mitigate this but not solve this problem).

    creationTime1 := cs.pgMgr.GetCreationTimestamp(podInfo1.Pod, *podInfo1.InitialAttemptTimestamp)
    creationTime2 := cs.pgMgr.GetCreationTimestamp(podInfo2.Pod, *podInfo2.InitialAttemptTimestamp)

What I want to do is make podGroup's queueing timestamp as a criteria, it will be refreshed together with a new scheduling cycle, the general idea is we'll maintain a podGroups cache in coscheduling. I'll write an updated KEP to detail the design.

What did you expect to happen?

pods of podGroup will be ordered based on the podGroup's queueing timestamp
podGroup's queueing timestamp will be refreshed in a new scheduling cycle
pods of the same podGroup will be popped out sequentially as today

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

Kubernetes version

None

Scheduler Plugins version

None

kerthcet commented 1 year ago

/remove-kind bug /kind feature

kerthcet commented 1 year ago

cc @Huang-Wei

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/scheduler-plugins/issues/658#issuecomment-2038760995): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

tenzen-y commented 8 months ago

@kerthcet @Huang-Wei Do we have no plan for this improvement?

Huang-Wei commented 8 months ago

I think it's a valid feature. I may need some volunteer to pick it up.

tenzen-y commented 8 months ago

/remove-lifecycle rotten

tenzen-y commented 8 months ago

I think it's a valid feature. I may need some volunteer to pick it up.

I think that @kerthcet already took this since he submitted the proposal here: https://github.com/kubernetes-sigs/scheduler-plugins/pull/661

Huang-Wei commented 8 months ago

I think that @kerthcet already took this since he submitted the proposal here: #661

ah i missed that (for so long). Will review by this weekend.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kerthcet commented 3 months ago

/lifecycle frozen

kubernetes-sigs / scheduler-plugins