irfanurrehman commented 6 years ago

Issue by nikhiljindal Wednesday Sep 28, 2016 at 00:55 GMT Originally opened as https://github.com/kubernetes/kubernetes/issues/33622

Right now, federation-apiserver only returns federated resources (resources created in federation control plane). For ex: doing a GET /api/v1/services returns federated services. To get the corresponding services from underlying kubernetes clusters, clients need to talk directly to those clusters. We want to enable clients to be able to get all resources (including the ones from underlying clusters) from federation-apiserver.

cc @kubernetes/sig-cluster-federation

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 01:10 GMT

Some options:

Add a new group version in federation-apiserver that clients will use to fetch kubernetes resources
Add options (ListOptions, GetOptions) and use them to determine which resources to return. By default GET /api/v1/services will return federated services. GET /api/v1/services with ListOptions.ClusterSelector for all kubernetes clusters will return services from those underlying kubernetes clusters.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 01:10 GMT

cc @kubernetes/sig-api-machinery

irfanurrehman commented 6 years ago

Comment by smarterclayton Wednesday Sep 28, 2016 at 16:52 GMT

When you say "add a new group version", are you proposing a transparent proxying of that group version to the backing server?

I think the federation API server, if it wants to proxy clusters, should do so explicitly rather than transparently. I'm also not sure I agree that the federation API server should be allowed to proxy clusters automatically - possibly until we sort out whether that constitutes an authorization escalation in all cases, but there are other concerns.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 21:48 GMT

What I meant was that /apis/newgroup/newversion/services will return the result of fetching /api/v1/services from all underlying clusters. We can also support /apis/newgroup/newversion/clusters/mycluster1/services to return services from mycluster1 only. With a different group version it is easier for admins to disable it.

Re: proxying it: Yes federation-apiserver can proxy the request to underlying clusters or it can also maintain a cache of underlying resources (similar to what our federation controllers do). The cache of all underlying resources is definitely going to be big and we can probably start with proxying.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 21:49 GMT

To clarify, its not just proxying to an underlying cluster. It also includes combining the results from multiple underlying clusters.

irfanurrehman commented 6 years ago

Comment by smarterclayton Wednesday Sep 28, 2016 at 23:40 GMT

Ok. My concern is primarily security in the short term - once we introduce this clients will always expect it, so we can't go back.

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Is this proxy going to fan out in parallel? How long will we wait for dead clusters?

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Will we sort the full list like we do per cluster? If we support paged queries in the future, will we also page these results?

Will the audit log list each backing query made?

On Sep 28, 2016, at 5:50 PM, Nikhil Jindal notifications@github.com wrote:

To clarify, its not just proxying to an underlying cluster. It also includes combining the results from multiple underlying clusters.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250311110, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_p_PwAv-_pjhpVnkvPKB_5ZCT_xv4ks5quuESgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by quinton-hoole Thursday Sep 29, 2016 at 00:24 GMT

@nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton Here are my proposed answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we? In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

irfanurrehman commented 6 years ago

Comment by smarterclayton Thursday Sep 29, 2016 at 00:35 GMT

If we log in the underlying cluster we lose the actor - I'm slightly concerned by the impersonation going on which is why I asked. Only the federated server today knows which user started the fanout, because when we call through to the underlying cluster we aren't impersonating (via impersonation) but actually pretending to be the user (with their credentials). In the audit log use cases knowing true attribution is important, so I'm inclined to overlog at the federation level to compensate.

We sorted because we wanted stable results - when we move to etcd3 we get that back for free so we can drop the explicit sort, but clients do "expect" sorted results.

On Sep 28, 2016, at 8:24 PM, Quinton Hoole notifications@github.com wrote:

@nikhiljindal https://github.com/nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton https://github.com/smarterclayton Here are my proposed answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we. In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250339109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_py76ci7z7S9j1ZyKBBwPr8QtRrS_ks5quwVNgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by quinton-hoole Thursday Sep 29, 2016 at 22:03 GMT

Yes, agreed, we need to solve the "lack of proper impersonation problem", and are working on that elsewhere. Once we have that, I think that we can and should delegate the some of the audit logging to the underlying clusters.

Until then, the question is do temporarily overlog in the federation layer, only to remove that later (and possibly change the API slightly as a result), or simply get by with inadequate audit logging until "proper impersonation" is in place. I could be convinced either way, but lean slightly toward the latter.

On Wed, Sep 28, 2016 at 5:36 PM, Clayton Coleman notifications@github.com wrote:

If we log in the underlying cluster we lose the actor - I'm slightly concerned by the impersonation going on which is why I asked. Only the federated server today knows which user started the fanout, because when we call through to the underlying cluster we aren't impersonating (via impersonation) but actually pretending to be the user (with their credentials). In the audit log use cases knowing true attribution is important, so I'm inclined to overlog at the federation level to compensate.

We sorted because we wanted stable results - when we move to etcd3 we get that back for free so we can drop the explicit sort, but clients do "expect" sorted results.

On Sep 28, 2016, at 8:24 PM, Quinton Hoole notifications@github.com wrote:

@nikhiljindal https://github.com/nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton https://github.com/smarterclayton Here are my proposed

answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we. In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/ 33622#issuecomment-250339109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_ py76ci7z7S9j1ZyKBBwPr8QtRrS_ks5quwVNgaJpZM4KITyh .

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250340577, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ6NAaDpsFdXiTZi4efWObrFVbmZcyVdks5quwgggaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by smarterclayton Thursday Sep 29, 2016 at 23:23 GMT

I don't care that much that I'd block. We tell people federation is for cluster admins or power accounts so I'm ok with it for now to be a bit under logged.

irfanurrehman commented 6 years ago

Comment by lavalamp Thursday Sep 29, 2016 at 23:56 GMT

Paging is on my list, but not near the top. When you implement this thing, keep in mind that we are going to be adding features (paging, field filtering, etc) to the generic apiserver and/or conversion stack. I guess I'm asking a couple things:

don't implement missing things in a one-off way
don't implement this feature in a way that will make it hard for you to use these things when we add them to the rest of our stack
don't get clients into a situation where we can't roll out (e.g.) paging to the main apiserver because federation-apiserver doesn't have it and clients can't deal with a difference.

I'm also not sure about your caching semantics.

Are you going to support watch? If so, there should be zero fan-out on a read call because you just need to keep a cache up to date 100% of the time.

How are you going to handle resource version? This is actually a big problem because the logical clocks from the different clusters are different, but clients may expect them to be comparable (for equality) because it's the same resource. Also, for the same reason, you won't be able to compute an accurate aggregate ResourceVersion for lists.

On Thu, Sep 29, 2016 at 4:23 PM, Clayton Coleman notifications@github.com wrote:

I don't care that much that I'd block. We tell people federation is for cluster admins or power accounts so I'm ok with it for now to be a bit under logged.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250620012, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnglrzG4nct-FFLh1oZrYb6Xowb3ziXks5qvEiMgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Sep 30, 2016 at 00:00 GMT

We've discussed vector resource versions but I wouldn't want to rush into that.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Monday Oct 03, 2016 at 21:19 GMT

Will the audit log list each backing query made?

Sure as discussed, its fine to log that at federation level for now with the aim of eventually delegating it to the underlying cluster.

@nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

I have had some initial discussions, filed this issue to make a decision. Using cluster selectors does seem more generic.

Paging is on my list, but not near the top. When you implement this thing, keep in mind that we are going to be adding features (paging, field filtering, etc) to the generic apiserver and/or conversion stack.

If we add them to genericapiserver, we will get them for federation apiserver at the same time :)

I guess I'm asking a couple things:

don't implement missing things in a one-off way

don't implement this feature in a way that will make it hard for you to use these things when we add them to the rest of our stack

don't get clients into a situation where we can't roll out (e.g.) paging to the main apiserver because federation-apiserver doesn't have it and clients can't deal with a difference.

Yes. We use the same {List,Delete}Options in federation-apiserver and kube-apiserver so will add the ClusterSelector field there so its generic and can be used in both (setting it in a request to kube-apiserver will not have any effect).

I'm also not sure about your caching semantics.

Are you going to support watch? If so, there should be zero fan-out on a read call because you just need to keep a cache up to date 100% of the time.

How are you going to handle resource version? This is actually a big problem because the logical clocks from the different clusters are different, but clients may expect them to be comparable (for equality) because it's the same resource.

The objects will have their ObjectMeta.Cluster field set. Clients should not be comparing resource versions of 2 different objects from 2 different clusters.

Also, for the same reason, you won't be able to compute an accurate aggregate ResourceVersion for lists.

Where is this aggregate ResourceVersion used?

irfanurrehman commented 6 years ago

Comment by nikhiljindal Monday Oct 03, 2016 at 22:11 GMT

1. Options for API

1.a Different group version

Pros:

Easier for admins to disable if they dont want to allow this feature.

Cons:

Supports fetching resources from all k8s clusters or a particular k8s cluster, but not from a subset of k8s clusters.
Cannot fetch resources from all k8s clusters and federation-apiserver in a single call (this is not an important Con because there is no real use case for this)

1.b Cluster selector in Options

Pros:

More flexible than the other option. Can fetch from any subset of clusters based on labels.

Cons:

Will need to add Get and UpdateOptions and update List and DeleteOptions in kubernetes. Setting this field for kube-apiserver requests will have no effect, but the field will still be visible there.

The second option (1.b) seems better to me since it is more generic.

2. Options for mechanism to fetch resources from underlying clusters

2.a Proxy to underlying clusters

Pros

No additional memory overhead for federation-apiserver.

Cons

This is going to be slower than a cache.
More susceptible to network flakes.

2.b Maintain a cache of all resources in all underlying clusters

Pros

Fast

Cons

More memory overhead.
federation-apiserver will always keep a watch for all resources on the underlying clusters even if there is no client. This introduces more load on underlying clusters.

The second options (2.b) seems better if we expect a lot of clients. We already have a lot of this logic to maintain similar caches in federation controllers (for ex: federation replicaset controller keeps a cache of all replicasets in all underlying clusters).

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Oct 14, 2016 at 14:29 GMT

If we set the performance target of one cluster to "as big as we can fit in memory" doesn't that mean that you can't then federate that cluster if you choose 2b?

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Oct 14, 2016 at 14:30 GMT

On 1b - if I specify cluster selector against a single cluster is that an error?

irfanurrehman commented 6 years ago

Comment by dims Wednesday Nov 16, 2016 at 14:43 GMT

This needs to be triaged as a release-blocker or not for 1.5 @smarterclayton @nikhiljindal @quinton-hoole

irfanurrehman commented 6 years ago

Comment by dims Friday Nov 18, 2016 at 12:34 GMT

@nikhiljindal all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-cluster-federation

irfanurrehman commented 6 years ago

Comment by madhusudancs Saturday Nov 19, 2016 at 19:25 GMT

This is a feature. Moving it to v1.6.

irfanurrehman commented 6 years ago

Comment by ethernetdan Monday Mar 13, 2017 at 22:32 GMT

Moving to 1.7 as late to happen in 1.6. Feel free to switch back if this is incorrect.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Thursday Apr 27, 2017 at 05:08 GMT

from @smarterclayton If we set the performance target of one cluster to "as big as we can fit in memory" doesn't that mean that you can't then federate that cluster if you choose 2b?

I think @quinton-hoole had some back of the envelope calculations for this.

On 1b - if I specify cluster selector against a single cluster is that an error?

I was expecting kubernetes will just ignore that field.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Thursday Apr 27, 2017 at 05:09 GMT

https://docs.google.com/document/d/1kvVP9GFop6XQiG7H7uvkMLl16TAX9WIolNrsKZXCu5Q is the design doc I had sent some time back for option 1a from my comment above.

@CindyXing is planning to write an updated doc.

Documenting some points that came up in discussions with @lavalamp: Supporting watch is going to get tricky with option 1.b (ClusterSelector in ListOptions). To support watch on a list created by federation-apiserver by aggregating list results from underlying clusters, federation-apiserver will need to create its own resource version for the list that it returns and will need to support watches based on those resource versions. So we wont be able to support watch if federation apiserver is just proxying. It will need to store the resource versions for the lists that it returns to enable clients to watch using those resource versions.

With the ClusterSelector option, it is also not possible to disable reading underlying kubernetes resources using RBAC rules. That will be possible with the other option since then we will have a separate path (/apis/group/version/clusters/c1/api/v1/services) that will require explicit authorization.

1.a option simplifies apiserver implementation and pushes aggregation operation to the client. So if we need to support kubectl get svc --all-clusters with option 1.a, then it will first need to list all clusters by calling /apis/federation/v1beta1/clusters and then call /apis/group/version/clusters/{c}/api/v1/services for all those clusters. This will require the user to have authorization to list clusters for them to be able to list resources in underlying clusters. Running kubectl get svc --cluster=clusterc1 will directly call /apis/group/version/clusters/clusterc1/api/v1/services without listing clusters first.

The path to list services from an underlying cluster c1 can be /apis/group/version/clusters/{c}/api/v1/services (where we define a new group/version) or /apis/federation/v1beta1/clusters/{c}/proxy/api/v1/services (/apis/federation/v1beta1/clusters is the existing path to CRUD clusters).

irfanurrehman commented 6 years ago

Comment by CindyXing Friday May 26, 2017 at 21:20 GMT

Based on the original design doc and above comments, the updated design is published at https://docs.google.com/document/d/1H2BkqSKvoCSifi7c8D2gASSJo20ocIEYDMSfNfcylBc/edit

irfanurrehman commented 6 years ago

Comment by marun Monday Jun 12, 2017 at 20:43 GMT

Moving to 1.8.

irfanurrehman commented 6 years ago

Comment by k8s-merge-robot Tuesday Sep 05, 2017 at 08:03 GMT

[MILESTONENOTIFIER] Milestone Labels Incomplete

@marun @nikhiljindal

Action required: This issue requires label changes. If the required changes are not made within 3 days, the issue will be moved out of the v1.8 milestone.

kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup']. priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].

Additional instructions available here

irfanurrehman commented 6 years ago

Comment by nikhiljindal Tuesday Sep 05, 2017 at 18:58 GMT

Updated the labels and moved out of 1.8

irfanurrehman commented 6 years ago

cc @nikhiljindal

prakashsingh08 commented 6 years ago

@nikhiljindal while creating storageclasses and othe objects in kubernetes federation cluster, i am getting error like below. error: error validating "storage-class.yaml": error validating data: the server could not find the requested resource; if you choose to ignore these errors, turn validation off with --validate=false

below is my storageclass file where apiVersion is "storage.k8s.io/v1" kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: mongodbs provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd

then i checked enabled apis using below command kubectl api-versions i got below response: extensions/v1beta1 federation/v1beta1 v1

how to enable other apis in federation cluster. any suggestions will be helpful

irfanurrehman commented 6 years ago

@prakashsingh08, all k8s APIs are not supported in the federation API server. This is by federations original design. Federation supports federating a subset of k8s APIs alone and including support of more resources is in the roadmap. However, we so far did not get any specific user requests for federating storage classes and is not there in the immediate roadmap. What is your use case? If you have a reasonable use case, we can talk about the same. You can post your queries and suggestions here or on the slack channel (sig-multicluster)

prakashsingh08 commented 6 years ago

@irfanurrehman actually our usecase is deploy mongoDB StatefulSet cluster in k8s in multi cluster. it requires creating StorageClass and StatefulSet in multiple cluster.we want to manage all our activities(like creating storageclass, statefulset etc) through federation cluster to avoid rework of creating Kubernetes Object in different clusters, so we need to access below apis

 apps/v1beta1
 storage.k8s.io/v1

is there any way to enable these APIs in federation cluster

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

kubernetes-retired / federation

federation: Allow reading kubernetes resources from federation-apiserver #76

1. Options for API

1.a Different group version

Pros:

Cons:

1.b Cluster selector in Options

Pros:

Cons:

2. Options for mechanism to fetch resources from underlying clusters

2.a Proxy to underlying clusters

Pros

Cons

2.b Maintain a cache of all resources in all underlying clusters

Pros

Cons