Open mikebrow opened 2 years ago
@mikebrow: The label(s) sig/sig-node
cannot be applied, because the repository doesn't have them.
@mikebrow: The label(s) sig/sig-node
cannot be applied, because the repository doesn't have them.
/sig node
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/milestone v1.26 /label lead-opted-in (I'm doing this on behalf of @ruiwen-zhao / SIG-node)
/stage alpha /label tracked/yes
Hey @mikebrow 👋, 1.26 Enhancements team here!
Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.
This enhancement is targeting for stage alpha
for 1.26
Here's where this enhancement currently stands:
implementable
For this KEP, we would need to:
The status of this enhancement is marked as at risk
. Please keep the issue description up-to-date with appropriate stages as well.
Thank you :)
Hello @mikebrow 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.
Please plan to get the action items mentioned in my comment above done before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow
For note, the current status of the enhancement is marked at-risk
:)
Hello 👋, 1.26 Enhancements Lead here.
Unfortunately, this enhancement did not meet requirements for enhancements freeze.
If you still wish to progress this enhancement in v1.26, please file an exception request. Thanks!
/milestone clear /label tracked/no /remove-label tracked/yes /remove-label lead-opted-in
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
this KEP needs to answer how limitation of node resources around sockets would be addressed. See https://github.com/kubernetes/kubernetes/pull/115143 for details.
Hello, I am very interested in this KEP. I happen to also wish for subsecond probes, and I was happy to stumble on this. I see there even is an implementation ! :) I have looked around, and found that in the corresponding discussion, @aojea also pointed this out.
Would enabling SO_REUSEADDR
in addition to SO_LINGER(1)
, as you did, only on probe-related sockets (hence in your new ProbeDialer
) be a good idea to address this? In case of ephemeral ports exhaustion, even with a TIME_WAIT
state reduced to 1s with your improvement, it could allow the client side (prober) to reuse an existing socket (but with the risk of misinterpreting an old reply hitting a newer probe on a "recycled" ephemeral port)?
On Linux the net.ipv4.tcp_tw_reuse
might be used to achieve the same, but this is Linux only.
I am very interested in this KEP.
Curious, do you need it for startup, readiness, or liveness probe? Or all of them? What interval are you thinking about?
Hello Sergey, I'd like to have subsecond delays/periods for all kinds of probes, in order to detect a failure as fast as possible. As explained in the KEP's README, the general idea would be to reduce latencies. I do not have a precise value in mind right now, but the current "second scale" is too coarce. Thank you.
nod being able to more precisely control the timing is a major part of the KEP and implementation.. If you know it takes 1.2 seconds to start up a DB.. it doesn't make sense to try at 1sec then 2sec or to wait for the 2 sec mark.. Instead maybe it would be better to wait for 1.5seconds? Totally depends on the the model being used and if they can switch to a ready on event push model instead of a state polling model.
Just needs SIG-NODE approval.. timing of this change vs all the other changes keeps pushing it back.. But I think it's ready any time the sigs are ready for it.
The reason I'm asking is that for liveness probe and partially for readiness probes using streaming instead of pings may work even better. For http it may be some version of a long poll, for gRPC - streaming health service. Streaming may eliminate many scalability concerns. The only thing - it will not work well for startup and for readiness flipping back to Ready. Retrying to establish connection will be easier to do with the same coarseness of 1s+.
Hello, I agree that streaming (in the sense of maintaining live the same socket for each probe?) would be preferable, unfortunately it is not always possible to make the application compliant. One will probably want to use this feature with some existing payload or applications they do not have developped. This would also require its own change in the probing mechanism.
A workaround might be possible by using sidecars: implement stream probes (using the same socket forever) targeting a sidecar which would, at its level, perform sub-second probes into the desired container. The sidecar would handle the persistent connection with the k8s stream probe, and would locally perform sub-second checks. This would move the problem from the kubelet into sidecars, which could look like "dissolving" the network overload. However this may seem overly complicated for a questionable result, since each physical node will still have to deal with more resource consumption.
unfortunately it is not always possible to make the application compliant.
I understand this. I am worried about using Node network for subsecond probes. Maybe implementing the probes from the Pod's network or streaming can help with this.
@mikebrow can you update the PR to indicate that you want it for 1.28.
/label lead-opted-in
/milestone v1.28
@mikebrow mentioned at sig node meeting he wants to see if it can be made to 1.28. Marking for the milestone to not loose it
Hello @mikebrow 👋, Enhancements team here.
Just checking in as we approach enhancements freeze on 01:00 UTC Friday, 16th June 2023.
This enhancement is targeting for stage alpha
for 1.28 (correct me, if otherwise)
Here's where this enhancement currently stands:
implementable
for latest-milestone: 1.28
For this KEP, we would just need to update the following:
latest-milestone
in kep.yaml
file to 1.28
kep.yaml
file.The status of this enhancement is marked as at risk
. Please keep the issue description up-to-date with appropriate stages as well. Thank you!
Hi @mikebrow 👋, just checking in before the enhancements freeze on 01:00 UTC Friday, 16th June 2023.
The status for this enhancement is at risk
.
For this KEP, we would just need to update the following:
latest-milestone
in kep.yaml
file to 1.28
kep.yaml
file.Let me know if I missed anything. Thanks!
@salehsedghpour
Hi @mikebrow 👋, just checking in before the enhancements freeze on 01:00 UTC Friday, 16th June 2023.
The status for this enhancement is
at risk
.For this KEP, we would just need to update the following:
The KEP requires to include the updated readme template. done..
Address questions inside the Production Readiness Review Questionnaire.
done..
- Update the
latest-milestone
inkep.yaml
file to1.28
done..
- Update the status to implementable in
kep.yaml
file.
done needs approval..
- Update the graduation criteria in the readme.
done needs approval..
- Ensure that the PRs are merged.
thx wip..
Let me know if I missed anything. Thanks!
thank you nothing noted you were very through :-)
Hello @mikebrow 👋, 1.28 Enhancements Lead here. Unfortunately, this enhancement did not meet requirements for v1.28 enhancements freeze. Feel free to file an exception to add this back to the release tracking process. Thanks!
/milestone clear
/milestone v1.29
(as discussed at SIG Node meeting this week)
updated kep to reflect milestone v1.29
Hello @mikebrow 👋, Enhancements team here.
Just checking in as we approach enhancements freeze on Friday, 6th October 2023.
This enhancement is targeting for stage alpha
for 1.29 (correct me, if otherwise)
Here's where this enhancement currently stands:
implementable
for latest-milestone: 1.29
. For this KEP, we would just need to update the following:
The status of this enhancement is marked as at risk for enhancement freeze
. Please keep the issue description up-to-date with appropriate stages as well. Thank you!
Hello @salehsedghpour, thx. I believe all the readme template issues are addressed in the update PR https://github.com/kubernetes/enhancements/pull/3067
for your convenience note this commit: https://github.com/kubernetes/enhancements/pull/3067/commits/7924ba213739250bfc575f2013a125fe645d3c9b
@mikebrow , thanks for the response. I just checked the readme template, I saw that in the latest readme template, there is this question that does not exist in https://github.com/kubernetes/enhancements/commit/7924ba213739250bfc575f2013a125fe645d3c9b. Please correct me if I'm wrong!
@mikebrow , thanks for the response. I just checked the readme template, I saw that in the latest readme template, there is this question that does not exist in 7924ba2. Please correct me if I'm wrong!
Nod that is almost the same question as asked right above..
[Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?](https://github.com/kubernetes/enhancements/issues/3066#will-enabling--using-this-feature-result-in-non-negligible-increase-of-resource-usage-cpu-ram-disk-io--in-any-components)
[Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?](https://github.com/kubernetes/enhancements/issues/3066#can-enabling--using-this-feature-result-in-resource-exhaustion-of-some-node-resources-pids-sockets-inodes-etc)
same response
Enabling / using this feature will result in changes to resource usage
(CPU, RAM, disk, IO, `PIDs, sockets, inodes`...) in kubelet and runtime coponents. This KEP provides for
mitigation of the changes.
Reducing the probe frequency to subsecond intervals will result in probes polling slightly more
frequently until success, as mitigated for exec probes and restricting to startup and readyness.
In a follow up KEP further mitigations and allowances may be considered based on resource
pressure, use cases for liveness probes, and if exec probe costs can be reduced via
architectural changes.
I can update if like.. seems repetitive
Yes, you are right. I'll ask for more information about this and get back to you.
With that being said, the only thing left is ensuring the PR is being merged into k/enhancements.
Hi @mikebrow , checking in once more as we approach the 1.29 enhancement freeze deadline on 01:00 UTC, Friday, 6th October, 2023. The status of this enhancement is marked as at risk
. It looks like https://github.com/kubernetes/enhancements/pull/3067 will address all of the requirements.
About the questionnaire, I'll bring the discussion up about those questions. And you don't need to update it for alpha
stage.
Let me know if I missed anything. Thanks!
Hello 👋, 1.29 Enhancements Lead here. Unfortunately, this enhancement did not meet requirements for v1.29 enhancements freeze. Feel free to file an exception to add this back to the release tracking process. Thanks!
/milestone clear
Is this still a priority?
Granular probe timings are essential for modern applications that demand precision and high reliability. The current "second scale" limits Kubernetes' responsiveness. Implementing this feature would significantly enhance Kubernetes' capabilities for a wide range of use cases.
Yes it is still a priority.
/remove-label lead-opted-in
/stage alpha /milestone v1.30
Hello @mikebrow , 1.30 Enhancements team here! Is this enhancement targeting 1.30? If it is, can you follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board? Thanks!
/milestone clear
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Is this still being worked on?
Is this still being worked on?
I hope so too, this KEP seemed to almost make it twice. Would like to help a little if possible and needed.
/remove-lifecycle rotten
Enhancement Description
k/enhancements
) update PR(s): https://github.com/kubernetes/enhancements/pull/3067k/k
) update PR(s): https://github.com/kubernetes/kubernetes/pull/107958k/website
) update PR(s):