kubernetes-retired / federation

[EOL] Cluster Federation
Apache License 2.0
209 stars 82 forks source link

federation: Allow reading kubernetes resources from federation-apiserver #76

Closed irfanurrehman closed 6 years ago

irfanurrehman commented 6 years ago

Issue by nikhiljindal Wednesday Sep 28, 2016 at 00:55 GMT Originally opened as https://github.com/kubernetes/kubernetes/issues/33622


Right now, federation-apiserver only returns federated resources (resources created in federation control plane). For ex: doing a GET /api/v1/services returns federated services. To get the corresponding services from underlying kubernetes clusters, clients need to talk directly to those clusters. We want to enable clients to be able to get all resources (including the ones from underlying clusters) from federation-apiserver.

cc @kubernetes/sig-cluster-federation

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 01:10 GMT


Some options:

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 01:10 GMT


cc @kubernetes/sig-api-machinery

irfanurrehman commented 6 years ago

Comment by smarterclayton Wednesday Sep 28, 2016 at 16:52 GMT


When you say "add a new group version", are you proposing a transparent proxying of that group version to the backing server?

I think the federation API server, if it wants to proxy clusters, should do so explicitly rather than transparently. I'm also not sure I agree that the federation API server should be allowed to proxy clusters automatically - possibly until we sort out whether that constitutes an authorization escalation in all cases, but there are other concerns.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 21:48 GMT


What I meant was that /apis/newgroup/newversion/services will return the result of fetching /api/v1/services from all underlying clusters. We can also support /apis/newgroup/newversion/clusters/mycluster1/services to return services from mycluster1 only. With a different group version it is easier for admins to disable it.

Re: proxying it: Yes federation-apiserver can proxy the request to underlying clusters or it can also maintain a cache of underlying resources (similar to what our federation controllers do). The cache of all underlying resources is definitely going to be big and we can probably start with proxying.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Wednesday Sep 28, 2016 at 21:49 GMT


To clarify, its not just proxying to an underlying cluster. It also includes combining the results from multiple underlying clusters.

irfanurrehman commented 6 years ago

Comment by smarterclayton Wednesday Sep 28, 2016 at 23:40 GMT


Ok. My concern is primarily security in the short term - once we introduce this clients will always expect it, so we can't go back.

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Is this proxy going to fan out in parallel? How long will we wait for dead clusters?

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Will we sort the full list like we do per cluster? If we support paged queries in the future, will we also page these results?

Will the audit log list each backing query made?

On Sep 28, 2016, at 5:50 PM, Nikhil Jindal notifications@github.com wrote:

To clarify, its not just proxying to an underlying cluster. It also includes combining the results from multiple underlying clusters.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250311110, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_p_PwAv-_pjhpVnkvPKB_5ZCT_xv4ks5quuESgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by quinton-hoole Thursday Sep 29, 2016 at 00:24 GMT


@nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton Here are my proposed answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we? In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

irfanurrehman commented 6 years ago

Comment by smarterclayton Thursday Sep 29, 2016 at 00:35 GMT


If we log in the underlying cluster we lose the actor - I'm slightly concerned by the impersonation going on which is why I asked. Only the federated server today knows which user started the fanout, because when we call through to the underlying cluster we aren't impersonating (via impersonation) but actually pretending to be the user (with their credentials). In the audit log use cases knowing true attribution is important, so I'm inclined to overlog at the federation level to compensate.

We sorted because we wanted stable results - when we move to etcd3 we get that back for free so we can drop the explicit sort, but clients do "expect" sorted results.

On Sep 28, 2016, at 8:24 PM, Quinton Hoole notifications@github.com wrote:

@nikhiljindal https://github.com/nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton https://github.com/smarterclayton Here are my proposed answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we. In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250339109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_py76ci7z7S9j1ZyKBBwPr8QtRrS_ks5quwVNgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by quinton-hoole Thursday Sep 29, 2016 at 22:03 GMT


Yes, agreed, we need to solve the "lack of proper impersonation problem", and are working on that elsewhere. Once we have that, I think that we can and should delegate the some of the audit logging to the underlying clusters.

Until then, the question is do temporarily overlog in the federation layer, only to remove that later (and possibly change the API slightly as a result), or simply get by with inadequate audit logging until "proper impersonation" is in place. I could be convinced either way, but lean slightly toward the latter.

On Wed, Sep 28, 2016 at 5:36 PM, Clayton Coleman notifications@github.com wrote:

If we log in the underlying cluster we lose the actor - I'm slightly concerned by the impersonation going on which is why I asked. Only the federated server today knows which user started the fanout, because when we call through to the underlying cluster we aren't impersonating (via impersonation) but actually pretending to be the user (with their credentials). In the audit log use cases knowing true attribution is important, so I'm inclined to overlog at the federation level to compensate.

We sorted because we wanted stable results - when we move to etcd3 we get that back for free so we can drop the explicit sort, but clients do "expect" sorted results.

On Sep 28, 2016, at 8:24 PM, Quinton Hoole notifications@github.com wrote:

@nikhiljindal https://github.com/nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

@smarterclayton https://github.com/smarterclayton Here are my proposed

answers to your questions:

I assume we'll deal with mismatched server versions by omitting partial results - will we default the objects or let the underlying API servers provide their own defaults?

Don't know for sure. I'm inclined towards the latter, but open to suggestions.

Is this proxy going to fan out in parallel?

Yes, for cache misses on reads. Mostly reads will be served from local cache.

How long will we wait for dead clusters?

Not very long. Approximately a few seconds. Clusters will either be explicitly reported as offline, or their results will be included in results. Details to be spelled out in the detailed design, but that's approximately how I think it should work.

What about cross namespace calls? Will we allow them even though they could be much larger than the individual calls in aggregate?

Yes, although we need result paging. We need that with or without federation.

Will we sort the full list like we do per cluster?

I was unaware that we did this. Why do we. In theory federation should be compatible with kubernetes, but I don't think that sorting is a good idea in general (but could be convinced otherwise). Perhaps it's necessary for decent paging semantics, but I suspect not.

If we support paged queries in the future, will we also page these results?

Yes, absolutely.

Will the audit log list each backing query made?

Yes, although perhaps in the underlying clusters, not necessarily in the federation itself (i.e. the audit log should be reconstructible, one way of the other).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/ 33622#issuecomment-250339109, or mute the thread https://github.com/notifications/unsubscribe-auth/ABG_ py76ci7z7S9j1ZyKBBwPr8QtRrS_ks5quwVNgaJpZM4KITyh .

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250340577, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ6NAaDpsFdXiTZi4efWObrFVbmZcyVdks5quwgggaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by smarterclayton Thursday Sep 29, 2016 at 23:23 GMT


I don't care that much that I'd block. We tell people federation is for cluster admins or power accounts so I'm ok with it for now to be a bit under logged.

irfanurrehman commented 6 years ago

Comment by lavalamp Thursday Sep 29, 2016 at 23:56 GMT


Paging is on my list, but not near the top. When you implement this thing, keep in mind that we are going to be adding features (paging, field filtering, etc) to the generic apiserver and/or conversion stack. I guess I'm asking a couple things:

I'm also not sure about your caching semantics.

Are you going to support watch? If so, there should be zero fan-out on a read call because you just need to keep a cache up to date 100% of the time.

How are you going to handle resource version? This is actually a big problem because the logical clocks from the different clusters are different, but clients may expect them to be comparable (for equality) because it's the same resource. Also, for the same reason, you won't be able to compute an accurate aggregate ResourceVersion for lists.

On Thu, Sep 29, 2016 at 4:23 PM, Clayton Coleman notifications@github.com wrote:

I don't care that much that I'd block. We tell people federation is for cluster admins or power accounts so I'm ok with it for now to be a bit under logged.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubernetes/issues/33622#issuecomment-250620012, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnglrzG4nct-FFLh1oZrYb6Xowb3ziXks5qvEiMgaJpZM4KITyh .

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Sep 30, 2016 at 00:00 GMT


We've discussed vector resource versions but I wouldn't want to rush into that.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Monday Oct 03, 2016 at 21:19 GMT


Will the audit log list each backing query made?

Sure as discussed, its fine to log that at federation level for now with the aim of eventually delegating it to the underlying cluster.

@nikhiljindal I believe that we've already agreed with SIG-API and others that your second option is the way to go (i.e. no new group version, but cluster selectors instead).

I have had some initial discussions, filed this issue to make a decision. Using cluster selectors does seem more generic.

Paging is on my list, but not near the top. When you implement this thing, keep in mind that we are going to be adding features (paging, field filtering, etc) to the generic apiserver and/or conversion stack.

If we add them to genericapiserver, we will get them for federation apiserver at the same time :)

I guess I'm asking a couple things:

  • don't implement missing things in a one-off way
  • don't implement this feature in a way that will make it hard for you to use these things when we add them to the rest of our stack
  • don't get clients into a situation where we can't roll out (e.g.) paging to the main apiserver because federation-apiserver doesn't have it and clients can't deal with a difference.

Yes. We use the same {List,Delete}Options in federation-apiserver and kube-apiserver so will add the ClusterSelector field there so its generic and can be used in both (setting it in a request to kube-apiserver will not have any effect).

I'm also not sure about your caching semantics.

Are you going to support watch? If so, there should be zero fan-out on a read call because you just need to keep a cache up to date 100% of the time.

How are you going to handle resource version? This is actually a big problem because the logical clocks from the different clusters are different, but clients may expect them to be comparable (for equality) because it's the same resource.

The objects will have their ObjectMeta.Cluster field set. Clients should not be comparing resource versions of 2 different objects from 2 different clusters.

Also, for the same reason, you won't be able to compute an accurate aggregate ResourceVersion for lists.

Where is this aggregate ResourceVersion used?

irfanurrehman commented 6 years ago

Comment by nikhiljindal Monday Oct 03, 2016 at 22:11 GMT


1. Options for API

1.a Different group version

Pros:

Cons:

1.b Cluster selector in Options

Pros:

Cons:

The second option (1.b) seems better to me since it is more generic.

2. Options for mechanism to fetch resources from underlying clusters

2.a Proxy to underlying clusters

Pros

Cons

2.b Maintain a cache of all resources in all underlying clusters

Pros

Cons

The second options (2.b) seems better if we expect a lot of clients. We already have a lot of this logic to maintain similar caches in federation controllers (for ex: federation replicaset controller keeps a cache of all replicasets in all underlying clusters).

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Oct 14, 2016 at 14:29 GMT


If we set the performance target of one cluster to "as big as we can fit in memory" doesn't that mean that you can't then federate that cluster if you choose 2b?

irfanurrehman commented 6 years ago

Comment by smarterclayton Friday Oct 14, 2016 at 14:30 GMT


On 1b - if I specify cluster selector against a single cluster is that an error?

irfanurrehman commented 6 years ago

Comment by dims Wednesday Nov 16, 2016 at 14:43 GMT


This needs to be triaged as a release-blocker or not for 1.5 @smarterclayton @nikhiljindal @quinton-hoole

irfanurrehman commented 6 years ago

Comment by dims Friday Nov 18, 2016 at 12:34 GMT


@nikhiljindal all issues must be labeled either release blocker or non release blocking by end of day 18 November 2016 PST. (or please move it to 1.6) cc @kubernetes/sig-cluster-federation

irfanurrehman commented 6 years ago

Comment by madhusudancs Saturday Nov 19, 2016 at 19:25 GMT


This is a feature. Moving it to v1.6.

irfanurrehman commented 6 years ago

Comment by ethernetdan Monday Mar 13, 2017 at 22:32 GMT


Moving to 1.7 as late to happen in 1.6. Feel free to switch back if this is incorrect.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Thursday Apr 27, 2017 at 05:08 GMT


from @smarterclayton If we set the performance target of one cluster to "as big as we can fit in memory" doesn't that mean that you can't then federate that cluster if you choose 2b?

I think @quinton-hoole had some back of the envelope calculations for this.

On 1b - if I specify cluster selector against a single cluster is that an error?

I was expecting kubernetes will just ignore that field.

irfanurrehman commented 6 years ago

Comment by nikhiljindal Thursday Apr 27, 2017 at 05:09 GMT


https://docs.google.com/document/d/1kvVP9GFop6XQiG7H7uvkMLl16TAX9WIolNrsKZXCu5Q is the design doc I had sent some time back for option 1a from my comment above.

@CindyXing is planning to write an updated doc.

Documenting some points that came up in discussions with @lavalamp: Supporting watch is going to get tricky with option 1.b (ClusterSelector in ListOptions). To support watch on a list created by federation-apiserver by aggregating list results from underlying clusters, federation-apiserver will need to create its own resource version for the list that it returns and will need to support watches based on those resource versions. So we wont be able to support watch if federation apiserver is just proxying. It will need to store the resource versions for the lists that it returns to enable clients to watch using those resource versions.

With the ClusterSelector option, it is also not possible to disable reading underlying kubernetes resources using RBAC rules. That will be possible with the other option since then we will have a separate path (/apis/group/version/clusters/c1/api/v1/services) that will require explicit authorization.

1.a option simplifies apiserver implementation and pushes aggregation operation to the client. So if we need to support kubectl get svc --all-clusters with option 1.a, then it will first need to list all clusters by calling /apis/federation/v1beta1/clusters and then call /apis/group/version/clusters/{c}/api/v1/services for all those clusters. This will require the user to have authorization to list clusters for them to be able to list resources in underlying clusters. Running kubectl get svc --cluster=clusterc1 will directly call /apis/group/version/clusters/clusterc1/api/v1/services without listing clusters first.

The path to list services from an underlying cluster c1 can be /apis/group/version/clusters/{c}/api/v1/services (where we define a new group/version) or /apis/federation/v1beta1/clusters/{c}/proxy/api/v1/services (/apis/federation/v1beta1/clusters is the existing path to CRUD clusters).

irfanurrehman commented 6 years ago

Comment by CindyXing Friday May 26, 2017 at 21:20 GMT


Based on the original design doc and above comments, the updated design is published at https://docs.google.com/document/d/1H2BkqSKvoCSifi7c8D2gASSJo20ocIEYDMSfNfcylBc/edit

irfanurrehman commented 6 years ago

Comment by marun Monday Jun 12, 2017 at 20:43 GMT


Moving to 1.8.

irfanurrehman commented 6 years ago

Comment by k8s-merge-robot Tuesday Sep 05, 2017 at 08:03 GMT


[MILESTONENOTIFIER] Milestone Labels Incomplete

@marun @nikhiljindal

Action required: This issue requires label changes. If the required changes are not made within 3 days, the issue will be moved out of the v1.8 milestone.

kind: Must specify at most one of ['kind/bug', 'kind/feature', 'kind/cleanup']. priority: Must specify at most one of ['priority/critical-urgent', 'priority/important-soon', 'priority/important-longterm'].

Additional instructions available here
irfanurrehman commented 6 years ago

Comment by nikhiljindal Tuesday Sep 05, 2017 at 18:58 GMT


Updated the labels and moved out of 1.8

irfanurrehman commented 6 years ago

cc @nikhiljindal

prakashsingh08 commented 6 years ago

@nikhiljindal while creating storageclasses and othe objects in kubernetes federation cluster, i am getting error like below. error: error validating "storage-class.yaml": error validating data: the server could not find the requested resource; if you choose to ignore these errors, turn validation off with --validate=false

below is my storageclass file where apiVersion is "storage.k8s.io/v1" kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: mongodbs provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd

then i checked enabled apis using below command kubectl api-versions i got below response: extensions/v1beta1 federation/v1beta1 v1

how to enable other apis in federation cluster. any suggestions will be helpful

irfanurrehman commented 6 years ago

@prakashsingh08, all k8s APIs are not supported in the federation API server. This is by federations original design. Federation supports federating a subset of k8s APIs alone and including support of more resources is in the roadmap. However, we so far did not get any specific user requests for federating storage classes and is not there in the immediate roadmap. What is your use case? If you have a reasonable use case, we can talk about the same. You can post your queries and suggestions here or on the slack channel (sig-multicluster)

prakashsingh08 commented 6 years ago

@irfanurrehman actually our usecase is deploy mongoDB StatefulSet cluster in k8s in multi cluster. it requires creating StorageClass and StatefulSet in multiple cluster.we want to manage all our activities(like creating storageclass, statefulset etc) through federation cluster to avoid rework of creating Kubernetes Object in different clusters, so we need to access below apis

 apps/v1beta1
 storage.k8s.io/v1

is there any way to enable these APIs in federation cluster

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

fejta-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close