kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.44k stars 1.49k forks source link

Support memory qos with cgroups v2 #2570

Open xiaoxubeii opened 3 years ago

xiaoxubeii commented 3 years ago

Enhancement Description

pacoxu commented 1 year ago

@xiaoxubeii the PR in the description looks like an old Docs PR from 1.22: kubernetes/website#28566. Is there a newer PR for 1.27? Yesterday was the PRs ready for review deadline. If there's not a PR for 1.27, please create and populate one as soon as possible.

https://github.com/kubernetes/website/pull/39853 Here is a place holder PR. @ndixita do you have time to update?

mickeyboxell commented 1 year ago

@pacoxu @ndixita Is there a Docs PR as well? The PR that was opened appears to be for a release blog. cc: @Rishit-dagli

pacoxu commented 1 year ago

@pacoxu @ndixita Is there a Docs PR as well? The PR that was opened appears to be for a release blog. cc: @Rishit-dagli

@mickeyboxell I don't think there is any place that requires an update for this feature except the blog.

sftim commented 1 year ago

I'd expect to see ~https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/~ https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/ explain this form of QoS, to help with disambiguation.

pacoxu commented 1 year ago

I opened https://github.com/kubernetes/website/pull/40513 and I will update it ASAP.

SergeyKanzhelev commented 1 year ago

as per the sig node meeting on 5/2/2023 this will be worked on in 1.28

/milestone v1.28

SergeyKanzhelev commented 1 year ago

/stage beta

SergeyKanzhelev commented 1 year ago

/label lead-opted-in

npolshakova commented 1 year ago

Hi @xiaoxubeii 👋, Enhancements team here!

Just checking in as we approach enhancements freeze on 01:00 UTC Friday, 16th June 2023.

This enhancement is targeting for stage beta for 1.28 (correct me, if otherwise.)

Here's where this enhancement currently stands:

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

ndixita commented 1 year ago

Thanks @npolshakova for bringing this up. This KEP is targeting beta in 1.28. I will go ahead and take care of the action items here on Monday.

cc: @xiaoxubeii

johnbelamaric commented 1 year ago

Please be sure to update the PRR questions for beta.

npolshakova commented 1 year ago

Hi @xiaoxubeii 👋, just checking in before the enhancements freeze on 01:00 UTC Friday, 16th June 2023. The status for this enhancement is at risk.

For this KEP, we would just need to update the following:

Let me know if I missed anything. Thanks!

SergeyKanzhelev commented 1 year ago

@npolshakova this should be good for 1.28 now

AdminTurnedDevOps commented 1 year ago

Hey @xiaoxubeii

1.28 Docs Shadow here.

Does this enhancement work planned for 1.28 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.28 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 20th July 2023.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

npolshakova commented 1 year ago

Hey again @xiaoxubeii :wave:

Just checking in as we approach Code freeze at 01:00 UTC Friday, 19th July 2023 .

Here’s the enhancement’s state for the upcoming code freeze:

For this enhancement, it looks like the following code related PR/s are open and they need to be merged or should be in merge-ready state before the code freeze commences :

Also please let me know if there are other PRs in k/k we should be tracking for this KEP. As always, we are here to help if any questions come up. Thanks!

Rishit-dagli commented 1 year ago

Hey @xiaoxubeii , could you please create a docs PR even if it is a draft PR with no content yet against dev-1.28 branch in the k/website repo. The deadline to create this draft PR is Thursday 20th July 2023.

pacoxu commented 1 year ago

Thanks for @ndixita detailed test: https://docs.google.com/document/d/1mY0MTT34P-Eyv5G1t_Pqs4OWyIH-cg9caRKWmqYlSbI/edit?usp=sharing.

🚨🚨 Sometimes the application pod is stuck for throttling memory. This is a worse behavior than OOM kill. So we decided to postpone promoting this feature until we can gracefully handle this issue.

See https://github.com/kubernetes/kubernetes/pull/118699#issuecomment-1635143442 as well.

So KEP needs an update.

Atharva-Shinde commented 1 year ago

Thanks for @ndixita detailed test: https://docs.google.com/document/d/1mY0MTT34P-Eyv5G1t_Pqs4OWyIH-cg9caRKWmqYlSbI/edit?usp=sharing. 🚨🚨 Sometimes the application pod is stuck for throttling memory. This is a worse behavior than OOM kill. So we decided to postpone promoting this feature until we can gracefully handle this issue. See https://github.com/kubernetes/kubernetes/pull/118699#issuecomment-1635143442 as well. So KEP needs an update.

Hello @ndixita @pacoxu with reference to above comment, I am removing this KEP from the current milestone. /milestone clear /remove label lead-opted-in

npolshakova commented 1 year ago

Hello @ndixita, 1.29 Enhancements team here! Is this enhancement targeting 1.29? If it is, can you follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board? Thanks!

SergeyKanzhelev commented 1 year ago

/stage alpha

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/enhancements/issues/2570#issuecomment-2027195015): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pacoxu commented 3 months ago

/remove-lifecycle stale /reopen

k8s-ci-robot commented 3 months ago

@pacoxu: Reopened this issue.

In response to [this](https://github.com/kubernetes/enhancements/issues/2570#issuecomment-2235985651): >/remove-lifecycle stale >/reopen > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
pacoxu commented 3 months ago

@ndixita in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2570-memory-qos#latest-update-stalled, the kep mentioned that we may use PSI here.

https://github.com/kubernetes/enhancements/issues/4205 is PSI kep and we may wait for that KEP implementation, IIUC.

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/enhancements/issues/2570#issuecomment-2294804656): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
ndixita commented 2 months ago

/reopen

k8s-ci-robot commented 2 months ago

@ndixita: Reopened this issue.

In response to [this](https://github.com/kubernetes/enhancements/issues/2570#issuecomment-2294924030): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
pacoxu commented 2 months ago

/remove-lifecycle rotten

pacoxu commented 1 month ago

Thanks for @ndixita detailed test: https://docs.google.com/document/d/1mY0MTT34P-Eyv5G1t_Pqs4OWyIH-cg9caRKWmqYlSbI/edit?usp=sharing.

🚨🚨 Sometimes the application pod is stuck for throttling memory. This is a worse behavior than OOM kill. So we decided to postpone promoting this feature until we can gracefully handle this issue.

See kubernetes/kubernetes#118699 (comment) as well.

So KEP needs an update.

For the blocker issue, it is a kernel behavior which may need fix or we need something like alibaba cloud memory high watermark which only trigger memory reclaim without throttling.

Some new thoughts on this.

The throttling behavior may not be suitable to all applications. Should we change the default behavior of this to be just disabled? Or I'm afraid this will be something forever alpha. 🤔