СRI image pulling with progress notification

byako commented 1 year ago

Enhancement Description

One-line enhancement description (can be used as a release note): СRI image pulling with progress notification
Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/pull/3547
Discussion Link: Sep 27th, 2022 sig-node weekly meeting
Primary contact (assignee): @byako
Responsible SIGs: sig-node
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.30
- Beta release target (x.y): 1.31
- Stable release target (x.y): 1.32
[ ] Alpha
- [ ] KEP (k/enhancements) update PR(s): KEP-3542: СRI image pulling with progress notification
- [ ] Code (k/k) update PR(s): https://github.com/kubernetes/kubernetes/pull/118326
- [ ] Docs (k/website) update PR(s):

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

drigz commented 9 months ago

Thank you for the explanation! The scalability question does sound tricky, as a cluster administrator I can understand Wojciech's point that it's hard to stay abreast of new features and evaluate in advance whether they'll cause problems. I can also imagine a worst case where some highly-replicated pods are updated to new containers, but the registry is overloaded and, say, trickles 1 byte / second to every node, which could generate # nodes * # pods events every minute.

Did you look into whether there is any backpressure mechanism that could protect the apiserver in a case like this? Our clusters are small so I'm not familiar with these techniques, but API Priority & Fairness seems like it could enforce a cluster-level limit on the rate of progress events. Maybe that would address the question about a protection mechanism. However, I don't know where this traffic would belong - the default levels seem focused on requests that are important for normal cluster operation, rather than informative log events like this, which are primarily of interest to humans observing the cluster. Maybe we need a new node-low priority level?

Please ignore this question if it's an unhelpful tangent!

byako commented 9 months ago

I have not checked any protection measures available yet, but I'll have a look at Priority and Fairness doc, thank you. If I'm not mistaken, current suggestion was to only publish events when something has subscribed to it. Details are to be defined.

salehsedghpour commented 8 months ago

/remove-label lead-opted-in

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

jon-nfc commented 4 months ago

+1

tonybart1337 commented 4 months ago

/remove-lifecycle rotten

BloodyIron commented 2 months ago

I for one really feel blind not knowing the progress of an image pull. It really would be useful to have some insights into the speed being pulled (so I can also troubleshoot network related problems), some form of percentage progress, and maybe an estimate of completion.

kubernetes / enhancements

СRI image pulling with progress notification #3542

Enhancement Description