kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.45k stars 1.49k forks source link

KEP-4962: Standardizing the Representation of Cluster Switch Network Topology #4965

Open dmitsh opened 6 days ago

dmitsh commented 6 days ago
- One-line PR description: Standardizing Cluster Network Topology Representation
k8s-ci-robot commented 6 days ago

Welcome @dmitsh!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 6 days ago

Hi @dmitsh. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
dmitsh commented 6 days ago

/cc @aojea

dmitsh commented 6 days ago

/cc @brianhammons @mwielgus @tardieu @mickeyboxell @arsenetar

k8s-ci-robot commented 6 days ago

@dmitsh: GitHub didn't allow me to request PR reviews from the following users: tardieu, arsenetar, brianhammons.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to [this](https://github.com/kubernetes/enhancements/pull/4965#issuecomment-2479905168): >/cc @brianhammons @mwielgus @tardieu @mickeyboxell @arsenetar Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
aojea commented 3 days ago

/assign @johnbelamaric

for sig-architecture

k8s-ci-robot commented 3 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dmitsh Once this PR has been reviewed and has the lgtm label, please ask for approval from johnbelamaric. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[keps/sig-network/OWNERS](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
aojea commented 2 days ago

I don't see the case for the annotations and labels and conventions be part of Kubernetes and maintained as part of Kubernetes.

Let me write my thoughts on this:

I see value in a predefined set of labels that describe topology, similar to what we have with zone and region, we can have a number of levels that represent proximity for network workloads, the KEP introduces:

1, accelerator: Network interconnect for direct accelerator communication (e.g., Multi-node NVLink interconnect between NVIDIA GPUs)

  1. block: Rack-level switches connecting hosts in one or more racks as a block.
  2. datacenter: Spine-level switches connecting multiple blocks inside a datacenter. 4.zone: Zonal switches connecting multiple datacenters inside an availability zone.

It can be other names, but having a number of predefined levels standard across environments will benefit the projects that implement scheduling based on network topology, I think 4 levels should be ok for most environments.

And now the friction points:

  1. Custom types or arbitrary number of levels, we all know that is hard to represent all the topologies, but if we leave freedom to add custom types then this solution no longer works because it means any deployment will be different, or there can be collision between environments with the same types but different meanings. It also will allow people to use official kubernetes labels for a completely different purpose, it is not likely the proposal of custom types will be accepted, I personally will not be in favor.

  2. The annotation suggestions misses who should add the annotation. It also relies on subjective and inconsistently measured attributes, we need things that can be measured the same way everywhere. The NIC or Interface of the node have some physical attributes that influence its potential performance, but the actual latency, bandwidth, and speed depends on a combination of factors, including the network environment and other devices involved. Determining the latency, bandwidth, and speed in a multi-tier network architecture adds another layer of complexity, since it also depends on the hardware, protocols, interconnections and it most probably has dependencies or shared resources that will vary over time.

sftim commented 2 days ago

OK, but responding to https://github.com/kubernetes/enhancements/pull/4965#issuecomment-2488337871: if the labels (and annotations?) were inside networking.foundation.example and not qos.kubernetes.io, topology.kubernetes.io etc: what do we lose?

If we can't articulate that, we should have pause. Maybe some other project should define the standard (not Kubernetes).

aojea commented 2 days ago

OK, but responding to #4965 (comment): if the labels (and annotations?) were inside networking.foundation.example and not qos.kubernetes.io, topology.kubernetes.io etc: what do we lose?

I do see value on topology.kubernetes.io/zone and topology.kubernetes.io/region . It made a difference to deploy High Available applications and consolidate environments, tooling and projects across the ecosystem.

So I think that aiming for something similar for the new type of workloads that require a more granular definition of network topology makes sense ... I really think we should work on what are the new region and zone for these type of workloads first , and adding four new well known labels makes sense to me, I think I can try to get consensues from Google side in 4 labels (@wojtek-t WDYT) , but we'll need people from other cloud providers to agree (@dims , @ritazh , ...) ... this only works if we all agree

sftim commented 1 day ago

Following up https://github.com/kubernetes/enhancements/pull/4965#issuecomment-2488390161

I still have concerns; I'll explain more.

We can encourage cloud providers to define their own labels as well. For example, on AWS a node can be labelled with topology.kubernetes.io/zone and with topology.k8s.aws/zone-id; labelling like that is of course additive.

I agree about the value topology.kubernetes.io/zone and topology.kubernetes.io/region; if we didn't have those, they would be roughly 100% worth defining. With those existing labels, there is also much less risk of wanting to assert that a node is in two of something. A node might connect to two switches, but we very very rarely put nodes in two regions. (If it's in two regions, why did you put your datacenter in Baarle-Hertog or wherever :wink:?)

So this KEP wouldn't persuade providers and / or hardware vendors to define their own parallel, provider-specific labels - but encouraging that kind of guidance would bring something on top of a common denominator of labels that Kubernetes does define.

I remain worried about tacitly promoting one network paradigm (a switching hierarchy), especially for a project that might be around another 10 years. This KEP should articulate its alternatives and make clear why we picked the approach we select.

aojea commented 1 day ago

I remain worried about tacitly promoting one network paradigm (a switching hierarchy),

agree, but I see this is about network hierarchy, switching hierarchy is just an implementation of a network hierarchy

Level 1: there are things that are closest , this can be a switch a rack or the PCI bus, the implementer decide this Level 2: there are things that have to go to hops to get to level 1, this connect multiple level 1, the number of hops is abstracted, you just know your are not at the same level of the things in level 1, the implementer decide this Level 3: same as level 2 for level 1, implementer decide, it connects level 2s Level 4: ...

sftim commented 1 day ago

I guess as an actual alternative, we could define an API for representing network topology.

For example, custom resources:

Maybe also:

Remember, alternatives in KEP feedback don't have to be the choice we select. They can instead be viable options we rule out but still list as alternatives we considered.

bowei commented 19 hours ago

/ok-to-test

k8s-ci-robot commented 19 hours ago

@dmitsh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-enhancements-test bb95a3c311e13f5c8fa8c72bb1f9bc948d6c5b13 link true /test pull-enhancements-test
pull-enhancements-verify bb95a3c311e13f5c8fa8c72bb1f9bc948d6c5b13 link true /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).