kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.45k stars 1.49k forks source link

Topology Aware Routing #2433

Open robscott opened 3 years ago

robscott commented 3 years ago

Enhancement Description

/sig network

Atharva-Shinde commented 2 years ago

Hey @robscott πŸ‘‹, just a quick check-in again before 1.26 code freeze at 17:00 PDT Tuesday 8th November 2022 i.e tomorrow. Looks like we would at least need to get the code PR: https://github.com/kubernetes/kubernetes/pull/113556 merged before the code-freeze :)

rhockenbury commented 2 years ago

Hello πŸ‘‹, 1.26 Enhancements Lead here.

Unfortunately, this enhancement did not meet requirements for code freeze.

If you still wish to progress this enhancement in v1.26, please file an exception request. Thanks!

/milestone clear /label tracked/no /remove-label tracked/yes /remove-label lead-opted-in

robscott commented 2 years ago

@rhockenbury sorry for the confusion here, the only PR we got into this release was https://github.com/kubernetes/kubernetes/pull/113544, which ended up being a superset of the PR I'd mentioned earlier. I'll work on a draft docs PR, but I don't think we need much since the change is pretty tiny in this cycle.

thockin commented 1 year ago

Rob, this is flagged for 1.27 - keep it or punt?

thockin commented 1 year ago

/label lead-opted-in

fsmunoz commented 1 year ago

Hello @robscott πŸ‘‹, v1.27 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT Thursday 9th February 2023.

This enhancement is targeting for stage beta for 1.27 (please correct me, if otherwise)

Here's where this enhancement currently stands:

For this KEP, we would need to update the following:

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

fsmunoz commented 1 year ago

Hello @robscott any updates on this? Trying to move things into tracked status the best that I can.

robscott commented 1 year ago

Hey @fsmunoz, I think everything should be good to go for this KEP with https://github.com/kubernetes/enhancements/pull/3765 merged in.

fsmunoz commented 1 year ago

Hi @robscott, thank you:

With the following notes:

This enhancement is ready to be traced for graduation to stable in v1.27.

/label tracked/yes

Rishit-dagli commented 1 year ago

Hi @robscott :wave:, I’m reaching out from the 1.27 Release Docs team. This enhancement is marked as β€˜Needs Docs’ for the 1.27 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.27 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by March 16. For more information, please take a look at Documenting for a release to familiarize yourself with the documentation requirements for the release. Please feel free to reach out with any questions. Thanks!

fsmunoz commented 1 year ago

Hi @robscott πŸ‘‹,

Checking in as we approach 1.27 code freeze at 17:00 PDT on Tuesday 14th March 2023.

Please ensure the following items are completed:

For this enhancement, it looks like the code PR is not linked in the issue description.

Please let me know what other PRs in k/k I should be tracking for this KEP.

As always, we are here to help should questions come up. Thanks!

salaxander commented 1 year ago

@marosset - looks like https://github.com/kubernetes/kubernetes/pull/116612 didn't merge before code freeze

marosset commented 1 year ago

Unfortunately the implementation PRs associated with this enhancement have not merged by code-freeze so this enhancement is getting removed from the release.

If you would like to file an exception please see https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md

/milestone clear /remove-label tracked/yes /label tracked/no

aojea commented 1 year ago

Unfortunately the implementation PRs associated with this enhancement have not merged by code-freeze so this enhancement is getting removed from the release.

If you would like to file an exception please see https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md

/milestone clear /remove-label tracked/yes /label tracked/no

@marosset we always defined the limit as the time the PR is approved, not when it is merged

robscott commented 1 year ago

Hey @salaxander and @marosset, just want to clarify on the proper procedure going forward. The PR was approved well before code freeze but failed to merge due to a tiny linter error. Do I need to file a formal exception just to get this in?

salaxander commented 1 year ago

Unfortunately the implementation PRs associated with this enhancement have not merged by code-freeze so this enhancement is getting removed from the release. If you would like to file an exception please see https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md /milestone clear /remove-label tracked/yes /label tracked/no

@marosset we always defined the limit as the time the PR is approved, not when it is merged

In the past, what we've allowed was PRs that were actively in the merge queue at the time of freeze. A linter error in this case would have stopped it from being in the merge queue. Given the nature of the hold up though, I'm ok with this going through. CC @marosset

marosset commented 1 year ago

Unfortunately the implementation PRs associated with this enhancement have not merged by code-freeze so this enhancement is getting removed from the release. If you would like to file an exception please see https://github.com/kubernetes/sig-release/blob/master/releases/EXCEPTIONS.md /milestone clear /remove-label tracked/yes /label tracked/no

@marosset we always defined the limit as the time the PR is approved, not when it is merged

In the past, what we've allowed was PRs that were actively in the merge queue at the time of freeze. A linter error in this case would have stopped it from being in the merge queue. Given the nature of the hold up though, I'm ok with this going through. CC @marosset

Ack - @robscott or @aojea please get the pull-kubernetes-verify PR check on https://github.com/kubernetes/kubernetes/pull/116612 passing and then we can get this added back into the 1.27 release.

robscott commented 1 year ago

Thanks for the understanding @salaxander and @marosset! I've pushed an update to the PR that I believe will resolve the linter error, will comment here again once tests are passing and I've gotten another LGTM.

fsmunoz commented 1 year ago

I have updated the status based on the implied information from the discussion

For this enhancement, it looks like:

  1. The code PR is not in the description, but it is mentioned in the discussion
  2. That code PR is not merged, but consensus is that being on the merge queue is sufficient.

So, although it doesn't comply with the wording of either checklist items, it's enough (as per the discussion) to be ready for 1.27.

aojea commented 1 year ago

The main code for the implementation has merged https://github.com/kubernetes/kubernetes/pull/116522 before code freeze.

There was this second PR to add some changes based on the comments from the first one that got out of the window because of the mentioned linter problem

The PR that didn't merge was approved before code freeze, but failed the linter because of a duplicate import statement, there is no code change more than removing this duplicate import to fix it.

Appreciate your understanding @marosset @salaxander

robscott commented 1 year ago

Thanks everyone for the help with this! https://github.com/kubernetes/kubernetes/pull/116612 has merged, I'll work on getting a docs PR up soon.

liggitt commented 1 year ago

I'm confused about the state of this feature... the milestone says 1.28 and it's marked as "Removed from milestone" in the 1.27 tracking project, but code was merged in 1.27. The label says stage/stable, but the feature gate in code says beta since 1.23... is this actually beta and updated in 1.27?

thockin commented 1 year ago

Sorry I jumped the gun on updating the milestone - we've been using it to track "next touch expected" (because milestone shows up in GH projects) but obviously it is used for release. I undid most of them, missed this one.

thockin commented 1 year ago

It was minorly updated in 27 - new annotation name (both accepted) and validation changes.

thockin commented 1 year ago

Sig-net status: Next-touch is GA in 1.28

liggitt commented 1 year ago

thanks for clarifying

as the tracking projects trend more accurate, I'll be happy to just use those to understand what is going into a release, since those can persist info (included, beta-level) independent of the issue milestone/stage-label, etc

thockin commented 1 year ago

The project boards are pretty limited, so milestone, as a visible field, becomes useful, but we are using it for too many things now.

On Mon, Mar 20, 2023 at 11:57β€―AM Jordan Liggitt @.***> wrote:

thanks for clarifying

as the tracking projects trend more accurate, I'll be happy to just use those to understand what is going into a release, since those can persist info (included, beta-level) independent of the issue milestone/stage-label, etc

β€” Reply to this email directly, view it on GitHub https://github.com/kubernetes/enhancements/issues/2433#issuecomment-1476774798, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVCRSGXL7OO6BO4EMX3W5CSCZANCNFSM4XEDSTJQ . You are receiving this because you were mentioned.Message ID: @.***>

thockin commented 1 year ago

NEXT: Aiming for GA in 1.28

ialidzhikov commented 1 year ago

Shouldn't we tackle first https://github.com/kubernetes/kubernetes/issues/113731 before promoting to GA? We are still looking for the simplest strategy/heuristic named SameZone - if there are endpoints in the same zone, route to those Endpoints, if not - fallback to any available Endpoint. I believe we are not the only ones out there in the community requesting this feature.

thockin commented 1 year ago

I don't think this GA needs to block on the alternative heuristics. The parts that are covered by the KEP now are "done", I think, and we can either open a new KEP to cover additional heuristics (and then argue over the need to port them to different implementations) or just proceed to PRs. The real problem here is that nobody has stepped up to implement the alternative heuristic(s) yet, and in the meantime this KEP is valuable to some people.

On Fri, May 12, 2023 at 6:06β€―AM Ismail Alidzhikov @.***> wrote:

Shouldn't we tackle first kubernetes/kubernetes#113731 https://github.com/kubernetes/kubernetes/issues/113731 before promoting to GA? We are still looking for the simplest strategy/heuristic named SameZone - if there are endpoints in the same zone, route to those Endpoints, if not - fallback to any available Endpoint. I believe we are not the only ones out there in the community requesting this feature.

β€” Reply to this email directly, view it on GitHub https://github.com/kubernetes/enhancements/issues/2433#issuecomment-1545717071, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVFRHJP542FQXNTAVU3XFYYUFANCNFSM4XEDSTJQ . You are receiving this because you were mentioned.Message ID: @.***>

aojea commented 1 year ago

@thockin @ialidzhikov @robscott added a new heuristic that simplify existing one but still has some safeguards to avoid blackholing traffic https://github.com/kubernetes/enhancements/pull/4003

thockin commented 1 year ago

Do we want to block this GA on the 2nd heuristic, or do we GA this and start a new one?

thockin commented 1 year ago

I need a decision on this for 1.28

robscott commented 1 year ago

I think @aojea was working on an update to the KEP PR that would simplify the changes proposed for v1.28. I personally would prefer to wait for GA until we have one or more of the following:

1) An alternative heuristic that is simpler and more predictable (I believe that's what @aojea is updating the proposal to cover) 2) A clearer path to bringing your own hints approach (likely related to https://github.com/kubernetes/enhancements/issues/3685)

thockin commented 1 year ago

Moving to 1.29

AnaMMedina21 commented 1 year ago

Hola @robscott πŸ‘‹, v1.29 Enhancements team here.

I am just checking in as we approach enhancements freeze on 18:00 PDT on Thursday 6th October 2023..

This enhancement is targeting for stage stable for v1.29 (correct me, if otherwise)

Here's where this enhancement currently stands:

For this KEP, we would just need to update the following:

The status of this enhancement is marked as At Risk for Enhancements Freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

jeremyrickard commented 1 year ago

The PRR could use another pass for graduating to stable, and is missing a new section in the stability section:

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

I had a question around the SLO while reading the current version of the PRR template.

What are the reasonable SLOs (Service Level Objectives) for the above SLIs? As a starting point, it is likely reasonable for the EndpointSlice controller to experience up to a 10% sync failure rate. This is largely related to it trying to update stale EndpointSlices. When we are able to find a solution for that issue the expected sync failure rate should be significantly lower. This specific problem is most notable for large Services that have rapidly updating endpoints.

Have you been able to find a solution for the issue around updating stale EndpointSlices? A 10% sync failure rate seems somewhat high for a stable feature. Is that still a good target for understanding if this should be disabled?

AnaMMedina21 commented 1 year ago

Hey @robscott πŸ‘‹, v1.29 Enhancements team here, checking in once more as we approach the v1.29 enhancement freeze deadline on 01:00 UTC, Friday, 6th October 2023. The status of this enhancement is marked as At Risk for Enhancements Freeze. It looks like the PRR needs another pass before approval. Let me know if I missed anything. Thank you!

robscott commented 1 year ago

Hey @AnaMMedina21 and @jeremyrickard, sorry for losing track of this! To clarify, we're not actually trying to make any changes to the KEP this cycle. We ran out of time to complete the changes we wanted to in 1.28 so those have just carried over to 1.29. We're continuing to stay in "beta", which I think means our PRR and everything else should still be good to go:

https://github.com/kubernetes/enhancements/blob/0d7e98235e47662468b8a064ad951b5a96a3ea3a/keps/prod-readiness/sig-network/2433.yaml#L1-L5

Definitely let me know if I'm missing anything though.

npolshakova commented 1 year ago

Hello πŸ‘‹, 1.29 Enhancements Lead here. Unfortunately, this enhancement did not meet requirements for v1.29 enhancements freeze. Feel free to file an exception to add this back to the release tracking process. Thanks!

/milestone clear

robscott commented 1 year ago

@npolshakova can you clarify what we were missing for this milestone? We're not making any changes to the KEP in this cycle, just trying to get the implementation caught up to the KEP... I asked for clarification on this 2 days ago in the comment above and didn't get a response.

npolshakova commented 1 year ago

Hi @robscott, if you have updated the alpha criteria then you do need this enhancement tracked for 1.29. If you intend to just address bug fixes in this release, this does not need to be tracked by the enhancements team for 1.29.

For the enhancement team to track this KEP we need a couple minor updates:

Let me know if you have any further questions. Thanks!

robscott commented 1 year ago

Thanks for catching that @npolshakova! I've filed https://github.com/kubernetes/enhancements/pull/4284 to update the KEP metadata here. I don't think I can remove the stable milestone from this issue, but we did not intend to target stable in this release, sorry for the confusion.

robscott commented 1 year ago

@npolshakova are you able to update the labels on this issue now that #4284 has merged? Anything else I'm missing?

salehsedghpour commented 10 months ago

/remove-label lead-opted-in

jonathon2nd commented 7 months ago

Checked in to see if there has been any update to allow for previous behavior.

Based of what I see here and the various other PR's and issues, it looks like a no?

Adding image as https://kubernetes.io/docs/concepts/services-networking/service-topology/ has been deleted. image

aojea commented 7 months ago

Checked in to see if there has been any update to allow for previous behavior.

see https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-routing-preference

thockin commented 6 months ago

@robscott what's the GA pln for this?

robscott commented 6 months ago

@robscott what's the GA pln for this?

The outcome here is closely tied to #4444, as it's a natural evolution of part of this KEP.

This KEP (#2433) essentially bundled two things together - the EndpointSlice hints field and a Service-level annotation that could be used to automatically populate hints with a rather confusing heuristic.

4444 continues to use the hints field, but adds a new trafficDistribution field and a simpler heuristic instead of the annotation and confusing heuristic. Maybe this table is a better summary:

#2433 #4444
Hints Defined in this KEP Relied on by this KEP
Opt-In service.kubernetes.io/topology-mode Annotation trafficDistribution Field
Heuristic Complicated Simple

My recommendation would be to keep the hints field defined in #2433 and the trafficDistribution field and simpler heuristic from #4444.

If we're good with that plan, I think that when #4444 graduates to beta (likely v1.31) we should formally deprecate the annotation and corresponding heuristic defined by this KEP. Then we can work to graduate #2433 to GA as a hints-only KEP and #4444 to GA as a KEP covering the trafficDistribution field and simple heuristic.

/cc @gauravkghildiyal