kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.28k stars 39.45k forks source link

Consider network resource management / guarantees / scheduling #40270

Open ApsOps opened 7 years ago

ApsOps commented 7 years ago

What keywords did you search in Kubernetes issues before filing this one? : Network quota, congestion


Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

There should be a way to schedule pods based on network resource availability - could be on the similar lines as cpu and memory request/limits.

At the very least, kubelet or node-problem-detector should report network congestion or bottlenecks.

One specific use-case is when pods have low cpu/mem requirements, but high network bandwidth needs.

@Random-Liu @thockin thoughts?

thockin commented 7 years ago

We don't schedule network right now, in part because it is very bursty. When people specify high network needs, they tend to strand everything else.

That is not to say we shouldn't do anything here, just explaining where we are.

On Jan 22, 2017 2:20 AM, "Amanpreet Singh" notifications@github.com wrote:

What keywords did you search in Kubernetes issues before filing this one? : Network quota, congestion

Is this a BUG REPORT or FEATURE REQUEST? (choose one): FEATURE REQUEST

There should be a way to schedule pods based on network resource availability - could be on the similar lines as cpu and memory request/limits.

At the very least, kubelet or node-problem-detector should report network congestion or bottlenecks.

One specific use-case is when pods have low cpu/mem requirements, but high network bandwidth needs.

@Random-Liu https://github.com/Random-Liu @thockin https://github.com/thockin thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/40270, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVHkjSdFwSOHOS5ZB6tFAV-zc73OSks5rUy1TgaJpZM4LqV4T .

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

ApsOps commented 6 years ago

@thockin Should this go in "feature requests" somewhere?

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle rotten /remove-lifecycle stale

ApsOps commented 6 years ago

/remove-lifecycle rotten /remove-lifecycle stale /lifecycle frozen

cmluciano commented 6 years ago

@kubernetes/sig-network-feature-requests

aojea commented 4 years ago

I think that this is no needed because TCP has it owns congestion control mechanism.

However, if there is still interest, the problem I see here is to define network congestion; is having 90% BW in bytes/per second for more than X mins? or a limit based on packet per second? The later has a worse impact on the host performance than the former, also the BW problem is solved by TCP control congestion.

My personal experience with all the network traffic detection/mechanisms is not good, they always has high rate of false positives.

@ApsOps are you still interested in move this forward or can be closed?

ApsOps commented 4 years ago

In my experience, TCP congestion control alone isn't sufficient. From my limited understanding, a saturated NIC would still lead to increased latencies. There's also a significant UDP traffic in most k8s clusters, mainly for DNS and statsd/datadog metrics. I guess a bad NIC might also lead to network issues on a node.

So I'd still like at least one of these to be possible:

aojea commented 3 years ago

The network is statistically multiplexed, you don't really use an X% of the networking in the node, you use the NIC to send your packets during Y secs, so you have bits or packets per second to measure this. How can you implement scheduling if you really can't say how much are you going to use? since my time working in ATM/SDH/SONET networks I don't see an app that defines the BW bps and L latency, that is why precisely IP/MPLS literally kill the former, statistically multiplexion is much more efficient.

However, agree with last comment, there is still the need to have QoS in the network, in case there is congestion you can prioritize traffic, that is also a common practice in large networks, so control plane traffic has guaranteed bw and can be implemented in Linux with tc ... I think that can be useful to have control-plane traffic (apiserver kubelet, ....) and DNS traffic gold per example, so it has more priority than others and always have a "reseved" bandwitdh guaranteed

So traffic-class.kubernetes.io: gold, silver, bronze to match pods and the CNI to implement and define gold, silver and bronze? @tgraf @thockin what do you think?

ehashman commented 3 years ago

I'd be inclined to close this and/or push this to the CNI, rather than trying to add another scheduling resource for pods.

Right now, node taints and pod anti-affinities give cluster admins a way to control and segment network-heavy workloads.

/remove-sig node

thockin commented 1 year ago

There are actually a few issues intermingled here. I've heard at least the following requests from people:

1) Treat network bandwidth like CPUs - a specific request and limit, guarantee a min and enforce a max. Example: Given a node with a 10Gb NIC, my pod is guaranteed 1.5 Gb, but can burst to 5Gb.

2) Provide priority-bands for access to bandwidth without specific numbers being specified. Example: This pod is p0, so should be able to use the NIC whenever it needs, in preference to p1 pods.

3) Provide a way to express a minimum bandwidth need. Example: This pod needs to run on a machine with at least 4Gb network - less than that is not acceptable.

4) Provide a real-time signal of congestion and use that to relocate pods. Example: If my pod can't get 4Gb/s at p90 over 2 minutes, kill it and pick a different node.

I think all of these are interesting in their own ways, but they all have downsides, too. Number 2 is actually under discussion, kind of, in this KEP: https://github.com/kubernetes/enhancements/pull/3004

Before we implement the other ideas, we'd need to do some deep thinking about the tradeoffs.

I'm open to these discussions and even KEPs, but we need to anchor it in REAL users.

I'm going to re-title this issue to be more discoverable.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kubernetes/issues/40270#issuecomment-1637171575): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
munnerz commented 6 days ago

I think there's a few layers to this issue that we should be careful not to conflate.

In highly multi-tenant Kubernetes environments, having a way to impose bandwidth, connections and even packets per second limits is important for avoiding disruption caused by one tenant to another. There are many ways to implement this, and the feasibility, importance and performance of each is highly dependent on the network being used. I don't think Kubernetes itself is well-placed to solve these universally.

Whilst acknowledging that a universal solution/implementation isn't really in-scope or desirable, there do exist today examples of implementing bandwidth control using a chained CNI plugin (https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping).

This API is currently a simple annotation, and does not express any kind of burst capabilities, is not 'structured' or validated, and is not acknowledged by the scheduler (nor is the amount of available bandwidth per node recorded into the Node so the scheduler could account for it).

I think it'd be fantastic to see some guidance/standardised API language for expressing different 'shared network resources' in a manner that allows the scheduler, kubelet, and then CNI to access.

The user-facing semantics of this resource type, i.e. default policies (eg 1Gbit/s & 1000 connections/s per CPU, unlimited etc) should probably take account of existing LimitRange semantics that we have today too, if this were to be expressed as a resource request/limit. The recent addition of pod level resources is also quite relevant as the network is a resource shared amongst the whole Pod.

(note/disclaimer: I've been quite hand-wavy with how the scheduler actually takes account of this as a type of schedulable unit in relation to CPU and memory. I am no scheduling expert, and whatever default plugins we might consider shipping to take account of these, if any, may be far more complex than I have anticipated here - would be great to get feedback on the complexity of adding additional scheduling dimensions like this or if there are generic mechanisms already in place through recent DRA work/existing custom resource type support).

munnerz commented 6 days ago

Giving this issue one more chance in life with our own use-case/needs :)

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 6 days ago

@munnerz: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/40270#issuecomment-2385474854): >Giving this issue one more chance in life with our own use-case/needs :) > >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 6 days ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.