kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.33k stars 1.44k forks source link

Multi-Cluster Services API #1645

Open JeremyOT opened 4 years ago

JeremyOT commented 4 years ago

Enhancement Description

Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement

JeremyOT commented 4 years ago

/sig multicluster /cc @pmorie @thockin

johnbelamaric commented 4 years ago

@JeremyOT Hi Jeremy. I am serving on the enhancements team for 1.19, which means we are tracking what KEPs may be targeted at 1.19. Do you see this moving to alpha in 1.19? If so there is work to be done on the KEP before enhancements freeze in about two weeks.

JeremyOT commented 4 years ago

@johnbelamaric I think this will be tight to make alpha for 1.19 - there are some open questions that may take >2 weeks to resolve but I'll work on getting the KEP to a complete state

johnbelamaric commented 4 years ago

Thanks Jeremy. I'll target it for 1.20, let me know if things proceed faster than expected.

/milestone v1.20

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

JeremyOT commented 3 years ago

/remove-lifecycle stale

kikisdeliveryservice commented 3 years ago

Hi @JeremyOT !

Enhancements Lead here, do you still intend to go alpha in 1.20?

Thanks! Kirsten

JeremyOT commented 3 years ago

Hey @kikisdeliveryservice , we decided to go Alpha out-of-tree at sigs.k8s.io/mcs-api instead. We'll likely come back to the original in-tree plans for Beta, but we don't have a release target yet

kikisdeliveryservice commented 3 years ago

Sounds good @JeremyOT just keep us posted! :)

JeremyOT commented 3 years ago

Will do!

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

BrendanThompson commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

JeremyOT commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

JeremyOT commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

swiftslee commented 2 years ago

/remove-lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

JeremyOT commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

JeremyOT commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

steeling commented 1 year ago

Is this the right place to leave comments on the KEP?

A multi-cluster service will be imported only by clusters in which the service's namespace exists. All clusters containing the service's namespace will import the service. This means that all exporting clusters will also import the multi-cluster service. An implementation may or may not decide to create missing namespaces automatically, that behavior is out of scope of this spec.

Would it be better to have ServiceImport's be non-namespaced? it seems like a short-coming to require the namespace exists, and can cloud a user's cluster by needing to create these unnecessary namespaces.

Take, for example, a multi-tenant solution where each customer's workload resides in their own unique namespace. Then I have some app frontend's that perform quota validation, routing logic, etc. I want these frontends to be globally available, while users can specify where they want their workloads to run. Now with this design, I need to have a namespace for each user in each cluster. This could likely break existing assumptions of other controllers that expect workloads to exist in these namespaces.

I'm sure there's other examples as well.

steeling commented 1 year ago

@lauralorenz can you please advise on the correct way to raise concerns with regards to this KEP? The above comment stands, as well as the following:

For ClusterSetIP services, this rationale is tied to the intent of its underlying ClusterIP Service. In a single-cluster setup, the purpose of a ClusterIP service is to reduce the context needed by the application to target ready backends, especially if those backends disappear or change frequently, and leverages kube-proxy to do this independent of the limitations of DNS. (ref) Similarly, users of exported ClusterIP services should depend on the single (or the single A/AAAA record mapped to it), instead of targeting per cluster backends. If a user has a need to target backends in a different way, they should use headless Services.

Why have this restriction? I can think of many use cases where a user would want to target a specific cluster, without the desire for a headless service, especially since a headless service will register the <hostname>.<clusterid>.<service>.<namespace>.clusterset.local, without providing a load balanced mechanism across the backends in the cluster. Some user scenarios:

  1. Cluster migrations: slowly shifting traffic from 1 cluster to another, particularly for traffic that is not proxied externally, or traffic that is sourced from within the cluster.
  2. Locality routing to reduce hair pinning. For example, let's say I run a managed cloud solution that runs in K8s across multiple clusters. I can ingress to any of these clusters to my application frontend (AFE) to perform some basic auth, quota management, etc, but inevitably a user's data is in a single cluster. I may need to do 10 individual requests between my application backend and my database. A common requirement would then be that my AFE can target a set of backends in that specific cluster, to reduce the roundtrips between clusters.
  3. Complex load balancing scenarios where load balancing is managed outside of kube-proxy

With those user requirements, can you provide more information on why we would decide not to support it?

JeremyOT commented 1 year ago

This is a fine place to raise concerns - though it might be easier to track them individually as issues against sigs.k8s.io/mcs-api

Would it be better to have ServiceImport's be non-namespaced? it seems like a short-coming to require the namespace exists, and can cloud a user's cluster by needing to create these unnecessary namespaces.

There are a few reasons why namespaces are beneficial. First, Services are already namespaced. Moving multi-cluster services up to the cluster level changes that characteristic. MCS was designed to follow namespace sameness, which encourages that same-named namespaces have the same owner and use across clusters, so this makes ownership easy to follow across clusters, vs having a separate set of permissions for the multi-cluster version of a service. Further, if we don't follow the existing service ownership model, we'll need to figure out how to extend other service related APIs to fit MCS vs just following existing patterns (e.g. https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2091-admin-network-policy). Does that make sense?

Why have this restriction? I can think of many use cases where a user would want to target a specific cluster

MCS supports this, just not with a specific cluster identifier. Instead, we're taking the position that since same-named namespaces are meant to represent the same things across clusters, if a specific cluster's service instance needs to be access individually it may not really be part of the same service. You can create cluster-1-service in cluster 1 and access it by name from any cluster. If both access patterns are needed, the current solution would be to create 2 services, say a my-svc service in each cluster merged into one and a my-svc-east in your east cluster for cluster-specific access. Admittedly this is a little more config than having the functionality built in, but the thinking was that this makes it specifically opt-in and easier to reason about (there are many cases where cluster specific access is not desired as well).

Digging into your example scenarios:

Cluster migrations: slowly shifting traffic from 1 cluster to another, particularly for traffic that is not proxied externally, or traffic that is sourced from within the cluster.

Doesn't this work with the shared service? As new pods are brought up in a new cluster traffic will shift proportionally to that new cluster. As for handling same-cluster source traffic, MCS doesn't make any statements about how traffic is routed behind the VIP so that implementations have the flexibility to make more intelligent routing decisions. Extensions like https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2433-topology-aware-hints aim to make that easier to implement.

Locality routing to reduce hair pinning

In cases like this, it seems like you'd either want cluster-specific pods, or even direct pod addressing. It seems like that would be a separate step after initial discovery wouldn't it? I don't think you need to use the same service instance for that. If it's required that user requests go to specific clusters/pods, doesn't the requester either need to know that ahead of time (separate services) or need some shared-global discovery service anyway?

Complex load balancing scenarios where load balancing is managed outside of kube-proxy

MCS makes no statements about how load is balanced at all - just that there's a VIP. Implementations absolutely should try to do better than kube-proxy's random spreading if they can but we didn't want to encode more into the KEP than necessary and instead opt for things that are generally applicable across implementations.

steeling commented 1 year ago

Thanks for the thorough response, and thanks in advance for hearing out these arguments!

This is a fine place to raise concerns - though it might be easier to track them individually as issues against sigs.k8s.io/mcs-api

Would it be better to have ServiceImport's be non-namespaced? it seems like a short-coming to require the namespace exists, and can cloud a user's cluster by needing to create these unnecessary namespaces.

There are a few reasons why namespaces are beneficial. First, Services are already namespaced. Moving multi-cluster services up to the cluster level changes that characteristic. MCS was designed to follow namespace sameness, which encourages that same-named namespaces have the same owner and use across clusters, so this makes ownership easy to follow across clusters, vs having a separate set of permissions for the multi-cluster version of a service. Further, if we don't follow the existing service ownership model, we'll need to figure out how to extend other service related APIs to fit MCS vs just following existing patterns (e.g. https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2091-admin-network-policy). Does that make sense?

ACK, it was a minor nit on my end, and haven't fully thought through the implications of cluster-scoping the resource.

Why have this restriction? I can think of many use cases where a user would want to target a specific cluster

MCS supports this, just not with a specific cluster identifier. Instead, we're taking the position that since same-named namespaces are meant to represent the same things across clusters, if a specific cluster's service instance needs to be access individually it may not really be part of the same service. You can create cluster-1-service in cluster 1 and access it by name from any cluster. If both access patterns are needed, the current solution would be to create 2 services, say a my-svc service in each cluster merged into one and a my-svc-east in your east cluster for cluster-specific access. Admittedly this is a little more config than having the functionality built in, but the thinking was that this makes it specifically opt-in and easier to reason about (there are many cases where cluster specific access is not desired as well).

Fair enough, this workaround existing at least unblocks the use case.

Digging into your example scenarios:

Cluster migrations: slowly shifting traffic from 1 cluster to another, particularly for traffic that is not proxied externally, or traffic that is sourced from within the cluster.

Doesn't this work with the shared service? As new pods are brought up in a new cluster traffic will shift proportionally to that new cluster. As for handling same-cluster source traffic, MCS doesn't make any statements about how traffic is routed behind the VIP so that implementations have the flexibility to make more intelligent routing decisions. Extensions like https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2433-topology-aware-hints aim to make that easier to implement.

Yes, although I was also considering the use case of something like SMI's TrafficSplit resource (or future implementations of that which may be tucked inside the Gateway API?)

Locality routing to reduce hair pinning

In cases like this, it seems like you'd either want cluster-specific pods, or even direct pod addressing. It seems like that would be a separate step after initial discovery wouldn't it? I don't think you need to use the same service instance for that. If it's required that user requests go to specific clusters/pods, doesn't the requester either need to know that ahead of time (separate services) or need some shared-global discovery service anyway?

Yup I was thinking about a shared discvoery service, and leveraging an address specific to that cluster, which would work with the workaround mentioned above.

Complex load balancing scenarios where load balancing is managed outside of kube-proxy

MCS makes no statements about how load is balanced at all - just that there's a VIP. Implementations absolutely should try to do better than kube-proxy's random spreading if they can but we didn't want to encode more into the KEP than necessary and instead opt for things that are generally applicable across implementations.

Fair enough :)

All of your points above do point out we don't need DNS resolution specific to each cluster's service export, but another question may be why not include it?

the thinking was that this makes it specifically opt-in and easier to reason about (there are many cases where cluster specific access is not desired as well).

I don't necessarily think it makes it easier to reason about. Intuitively i would think that since we have:

service.ns.svc.cluster.local service.ns.svc.cluster service.ns.svc service.ns service hostname.service.cluster.local ... hostname.service

are all resolvable via DNS, and adding

hostname.clusterid.service.ns.svc.cluster.local would also intuitively mean that clusterid.service.ns.svc.cluster.local

Having it exist is essentially already "opt-in", in that I don't need to use it if I don't want. IMHO there should be a bit more justification on omitting it, although with the workaround you mentioned its not a hill a would die on :)

JeremyOT commented 1 year ago

Having it exist is essentially already "opt-in", in that I don't need to use it if I don't want. IMHO there should be a bit more justification on omitting it, although with the workaround you mentioned its not a hill a would die on :)

I think the biggest issue here is that if it exists we need to reserve another VIP per-cluster which eats up another constrained resource you may not be using, and if you have many clusters, may eat into it quite quickly. I've also been thinking that the opt-in is more about allowing use than use itself. Consumers may not care if an unused VIP sits around for a service, but as a producer I want to control whether or my consumers have that option. E.g. what if a consumer decides for some reason or another to take an explicit dependency on cluster-a, if I didn't intend to allow cluster-specific access I might decide to replace cluster-a with -b and/or -c, or move which cluster my service is deployed in. Explicit opt-in to per-cluster exposure lets me decide in advance how to handle that, it also gives me a sort of alias for the per-cluster service and if I really needed to, I could move cluster-a-svc to cluster-b without impacting consumers. Definitely getting into messy territory here, but the main point I'm getting at is that opt-in even with this workaround seems like it introduces less risk and potential side effects.

If you want to discuss further, this might be a good topic for our bi-weekly meetups and a live convo. I definitely appreciate pushback and diving into the API here. We really want to make sure we aren't closing any doors unnecessarily

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

steeling commented 1 year ago

Having it exist is essentially already "opt-in", in that I don't need to use it if I don't want. IMHO there should be a bit more justification on omitting it, although with the workaround you mentioned its not a hill a would die on :)

I think the biggest issue here is that if it exists we need to reserve another VIP per-cluster which eats up another constrained resource you may not be using, and if you have many clusters, may eat into it quite quickly. I've also been thinking that the opt-in is more about allowing use than use itself. Consumers may not care if an unused VIP sits around for a service, but as a producer I want to control whether or my consumers have that option. E.g. what if a consumer decides for some reason or another to take an explicit dependency on cluster-a, if I didn't intend to allow cluster-specific access I might decide to replace cluster-a with -b and/or -c, or move which cluster my service is deployed in. Explicit opt-in to per-cluster exposure lets me decide in advance how to handle that, it also gives me a sort of alias for the per-cluster service and if I really needed to, I could move cluster-a-svc to cluster-b without impacting consumers. Definitely getting into messy territory here, but the main point I'm getting at is that opt-in even with this workaround seems like it introduces less risk and potential side effects.

If you want to discuss further, this might be a good topic for our bi-weekly meetups and a live convo. I definitely appreciate pushback and diving into the API here. We really want to make sure we aren't closing any doors unnecessarily

A super late reply here, but just had a thought, which could allow the best of both worlds. Why not add a field, exposePerClusterServices, defaulted to false.

It's pretty cumbersome to ask the user to create a serviceimport/export for each service in each cluster otherwise. I also plan on attending some of the upcoming meetings so can chat then, thanks!

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/enhancements/issues/1645#issuecomment-1279608579): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
JeremyOT commented 1 year ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 1 year ago

@JeremyOT: Reopened this issue.

In response to [this](https://github.com/kubernetes/enhancements/issues/1645#issuecomment-1279609199): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sftim commented 1 year ago

https://github.com/kubernetes/website/pull/37418 highlighted that MCS is not documented (or, if it is, those docs are too hard to find).

We should aim to document this API. We document features and APIs once they reach alpha and end users could be opting in to use them.

sftim commented 1 year ago

(code that is in-project but out of tree still needs docs)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lauralorenz commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 days ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

lauralorenz commented 3 days ago

/remove-lifecycle rotten