kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.39k stars 1.45k forks source link

Quotas for Ephemeral Storage #1029

Open RobertKrawitz opened 5 years ago

RobertKrawitz commented 5 years ago

Enhancement Description

RobertKrawitz commented 5 years ago

/sig node

derekwaynecarr commented 5 years ago

per sig-node discussion, this is good.

the kep will merge here: https://github.com/kubernetes/enhancements/pull/646

/milestone v1.15

kacole2 commented 5 years ago

@RobertKrawitz This is OK to track for 1.15. However, please update the KEP with established graduation criteria for Beta and Alpha when possible.

/stage alpha

makoscafee commented 5 years ago

Hey, @RobertKrawitz @derekwaynecarr πŸ‘‹ I'm the v1.15 docs Lead. Does this enhancement require any new docs (or modifications)?

Just a friendly reminder we're looking for a PR against k/website (branch dev-1.15) due by Thursday, May 30th. It would be great if it's the start of the full documentation, but even a placeholder PR is acceptable. Let me know if you have any questions!

RobertKrawitz commented 5 years ago

Yes, this will require new documentation.

I will open the website PR today.

kacole2 commented 5 years ago

Hi @RobertKrawitz @derekwaynecarr . Code Freeze is Thursday, May 30th 2019 @ EOD PST. All enhancements going into the release must be code-complete, including tests, and have docs PRs open.

Please list all current k/k PRs so they can be tracked going into freeze. If the PRs aren't merged by freeze, this feature will slip for the 1.15 release cycle. Only release-blocking issues and PRs will be allowed in the milestone.

If you know this will slip, please reply back and let us know. Thanks!

makoscafee commented 5 years ago

Hey, @RobertKrawitz @derekwaynecarr .Just a friendly reminder we're looking for at least a draft/placeholder PR against k/website (branch dev-1.15) due by Thursday, May 30th 2019 @ EOD PST.

RobertKrawitz commented 5 years ago

@makoscafee The k/website PR is https://github.com/kubernetes/website/pull/14268

kacole2 commented 5 years ago

Hi @RobertKrawitz @derekwaynecarr , today is code freeze. I do not see a reply for any k/k PRs to track for this merge. It's now being marked as At Risk in the 1.15 Enhancement Tracking Sheet. If there is no response, or you respond with PRs to track and they are not merged by EOD PST, this will be dropped from the 1.15 Milestone. After this point, only release-blocking issues and PRs will be allowed in the milestone with an exception.

RobertKrawitz commented 5 years ago

The PR for this is https://github.com/kubernetes/kubernetes/pull/66928

RobertKrawitz commented 5 years ago

/hold cancel

kacole2 commented 5 years ago

Hi @RobertKrawitz , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.6 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

kcmartin commented 4 years ago

Hello @RobertKrawitz -- 1.17 Enhancement Shadow here! πŸ™‚

I wanted to reach out to see if this enhancement will be graduating to alpha/beta/stable in 1.17?

 Please let me know so that this enhancement can be added to 1.17 tracking sheet.

Thank you!

πŸ””Friendly Reminder

The current release schedule is

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

palnabarun commented 4 years ago

/remove-lifecycle stale

palnabarun commented 4 years ago

Hey there @RobertKrawitz -- 1.18 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating to alpha in 1.18 or having a major change in its current level?

The current release schedule is:

To be included in the release,

  1. The KEP PR must be merged
  2. The KEP must be in an implementable state
  3. The KEP must have test plans and graduation criteria.

If you would like to include this enhancement, once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. πŸ‘

We'll be tracking enhancements here: http://bit.ly/k8s-1-18-enhancements

Thanks! :)

palnabarun commented 4 years ago

@RobertKrawitz Just a friendly reminder, we are just 7 days away from the Enhancement Freeze (Tuesday, January 28th).

palnabarun commented 4 years ago

@RobertKrawitz Just a friendly reminder, we are just 2 days away from the Enhancement Freeze (3 PM Pacific Time, Tuesday, January 28th).

palnabarun commented 4 years ago

Unfortunately, the deadline for the 1.18 Enhancement freeze has passed. For now, this is being removed from the milestone. If there is a need to get this in, please file an enhancement exception.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

palnabarun commented 4 years ago

/remove-lifecycle stale

harshanarayana commented 4 years ago

Hey there @RobertKrawitz , 1.19 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

If you do, I'll add it to the 1.19 tracking sheet (http://bit.ly/k8s-1-19-enhancements). Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. πŸ‘

Thanks!

RobertKrawitz commented 4 years ago

On 4/29/20 10:57 AM, Harsha Narayana wrote:

Hey there @RobertKrawitz , 1.19 Enhancements shadow here. I wanted to check in and see if you think this Enhancement will be graduating in 1.19?

In order to have this part of the release:

  1. The KEP PR must be merged in an implementable state
  2. The KEP must have test plans
  3. The KEP must have graduation criteria.

The current release schedule is:

  • Monday, April 13: Week 1 - Release cycle begins
  • Tuesday, May 19: Week 6 - Enhancements Freeze
  • Thursday, June 25: Week 11 - Code Freeze
  • Thursday, July 9: Week 14 - Docs must be completed and reviewed
  • Tuesday, August 4: Week 17 - Kubernetes v1.19.0 released

If you do, I'll add it to the 1.19 tracking sheet (http://bit.ly/k8s-1-19-enhancements). Once coding begins please list all relevant k/k PRs in this issue so they can be tracked properly. πŸ‘

Hi Harsha,

I don't expect that this is going to graduate.

harshanarayana commented 4 years ago

Hi @RobertKrawitz, thanks for following up with an update on this. I have updated the tracking sheets accordingly. πŸ‘

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

frittentheke commented 4 years ago

/remove-lifecycle stale

kikisdeliveryservice commented 3 years ago

Hi @RobertKrawitz

Enhancements Lead here. Any plans for this in 1.20?

Thanks! Kirsten

kikisdeliveryservice commented 3 years ago

Hi @RobertKrawitz

Enhancements Lead here again. Enhancements Freeze is next week Tuesday October 6th.

Any plans for this in 1.20?

Also, this KEP is using the older format that is missing the Production Readiness Review Questionnaire, etc... so if you could please update that would be awesome (see for ref https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template)

Thanks! Kirsten

sjenning commented 3 years ago

No plans for 1.20

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

frittentheke commented 3 years ago

/remove-lifecycle stale

@RobertKrawitz @sjenning the link to the KEP is currently broken, likely due to https://github.com/kubernetes/enhancements/commit/7eef794bb549a50c6b08c457556ff0eac98a4c6b. Should be https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1029-ephemeral-storage-quotas now then.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

frittentheke commented 3 years ago

/remove-lifecycle stale

pacoxu commented 3 years ago

During working on https://github.com/kubernetes/kubernetes/pull/99635, I find this feature is still alpha with v1.15 to v1.21. /assign

I'd like to follow up on this feature. In 1.22 cyle, I will benchmark it and then check with sig if we should promote it to beta then.

ping @RobertKrawitz

ehashman commented 3 years ago

/milestone v1.22 /stage beta

@pacoxu feel free to run with this

reylejano commented 3 years ago

Hi @pacoxu,

1.22 Enhancement shadow checking in. The Enhancement freeze is coming up at 23:59:59 PST on Thursday 13th May. In reviewing your KEP, several things needs to be addressed:

If you have any questions, please reach out.

reylejano commented 3 years ago

Hi @pacoxu, please see my comment above on what needs to be done for this enhancement to meet the 1.22 Enhancement Freeze which starts on Thursday, May 13 at 23:59:29 pst

pacoxu commented 3 years ago

Thanks @reylejano. I updated the PR and need review.

pacoxu commented 3 years ago

@ehashman @reylejano after some comments and clarification in the PR #2697, I think this will not target beta in 1.22. I will work on some preparations to promote this feature to beta in this release cycle.

Action Items in 1.22~1.24

  • add some metrics or make more visibility
  • benchmark the feature
  • check e2e testings: e2e evolution

Action Items in 1.25-1.26

  • promote it to beta
deads2k commented 3 years ago

Updating the milestone to reflect the new target for stability level change.

/milestone v1.23

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

pacoxu commented 2 years ago

/remove-lifecycle rotten

silenceper commented 2 years ago

@pacoxu Is there a plan to enforce restrictions through project quota? I think this is mainly used for resource monitoring. When the quota is exceeded, the pod will be expelled.

pacoxu commented 2 years ago

@pacoxu Is there a plan to enforce restrictions through project quota? I think this is mainly used for resource monitoring. When the quota is exceeded, the pod will be expelled.

After reading the KEP and some histories in https://github.com/kubernetes/community/pull/2638, LocalStorageCapacityIsolationFSQuotaMonitoring is just for monitoring and the enforcement should happen with another new KEP.

pacoxu commented 2 years ago

I have done some testing on this feature in the last and this release cycle.

I opened https://github.com/kubernetes/kubernetes/pull/107201 to add some metrics. Currently, there are no detailed metrics about quota monitoring speed or eviction.

For eviction testing, I did something like comparing the behavior and speed between enable LocalStorageCapacityIsolationFSQuotaMonitoring or not.

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx
  name: pod-test
  namespace: default
spec:
  containers:
  - args:
    - "100000"
    command:
    - sleep
    image: nginx:1.14.2
    imagePullPolicy: IfNotPresent
    name: nginx
    volumeMounts:
    - name: data-test
      mountPath: /data/test/
  volumes:
  - name: data-test
    emptyDir:
       sizeLimit: "2Gi"
  1. Create a pod like above
  2. I run seq -w 1 100000 | xargs -I {} sh -c 'mkdir -p z/{}/{}/{}/{} && echo xxx > z/{}/{}/{}/{}/{}' inside the pod under /data/test. The data generated is about 2.0G.
  3. After that, I run du or repquota to check the speed.
  4. Then run dd if=/dev/zero of=./bigfile bs=1M count=200 && date to generate a 200MB file to make the pod exceed the limit and also show the time.
  5. check kubelet log for eviction related log.
Description enable disable
LocalStorage
CapacityIsolation
FSQuotaMonitoring
true false
cal speed time repquota -P /var/lib/kubelet -s -v
0m0.004s
du -clsh
0m6.938s
pod eviction (housekeeping run each 10s by default) pod evicted in the next run of housekeeping
~3s (1-10s)
pod evicted in the third run of housekeeping
~23s (21~29s)
with metrics that were added by the PR # TYPE kubelet_volume_stat_cal_duration_seconds histogram
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.005"} 34348
kubelet_volume_stat_cal_duration_seconds_sum 1.5840709110000017
kubelet_volume_stat_cal_duration_seconds_count 34348
kubelet_volume_stat_cal_duration_seconds_sum 118.40980060400001
kubelet_volume_stat_cal_duration_seconds_count 390

https://github.com/kubernetes/kubernetes/pull/107201 is trying to add metrics for volume calculation. After that, I can run some test cases to have some benchmarks then. Below is the result:

Before, there are 16 volume cal durations that are more than 2.5s.

# HELP kubelet_volume_stat_cal_duration_seconds [ALPHA] Duration in seconds to calculate volume stats
# TYPE kubelet_volume_stat_cal_duration_seconds histogram
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.005"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.01"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.025"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.05"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.1"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.25"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.5"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="1"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="2.5"} 374
kubelet_volume_stat_cal_duration_seconds_bucket{le="5"} 380
kubelet_volume_stat_cal_duration_seconds_bucket{le="10"} 389
kubelet_volume_stat_cal_duration_seconds_bucket{le="+Inf"} 390
kubelet_volume_stat_cal_duration_seconds_sum 118.40980060400001
kubelet_volume_stat_cal_duration_seconds_count 390

After enabling the feature, all 34348 calculations used 1.5s in total.

# TYPE kubelet_volume_stat_cal_duration_seconds histogram
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.005"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.01"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.025"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.05"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.1"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.25"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="0.5"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="1"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="2.5"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="5"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="10"} 34348
kubelet_volume_stat_cal_duration_seconds_bucket{le="+Inf"} 34348
kubelet_volume_stat_cal_duration_seconds_sum 1.5840709110000017
kubelet_volume_stat_cal_duration_seconds_count 34348
pacoxu commented 2 years ago

Action Item Update:

BTW, for the e2e test, it seems to be slow and serial. It cannot be promoted to conformance testing when it is ready to GA one day. For beta feature, I'm not sure what further e2e evolution is needed.

pacoxu commented 2 years ago

Action Items in 1.22~1.24

  • add some metrics or make more visibility
  • benchmark the feature
  • check e2e testings: e2e evolution

Action Items in 1.25-1.26

  • promote it to beta

I am not sure if there are some further things I can do to accelerate the promotion.

Priyankasaggu11929 commented 2 years ago

/milestone v1.25

Priyankasaggu11929 commented 2 years ago

Hello @RobertKrawitz @pacoxu πŸ‘‹, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 23, 2022.

For note, This enhancement is targeting for stage beta for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

Looks like for this one, we would need to update the following:

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!