promote multicluster to stable in 1.13

stevenctl commented 3 years ago

Testing:

[x] Multi-cluster has strong coverage across all features. New tests are generally written to be multi-cluster by default.
[x] added Multi-Revision testing: https://github.com/istio/istio/pull/35331
[ ] Improve support for removing a cluster, or replacing it's remote secret

Features/improvements (some of these were added prior to 1.12, but after the beta promotion):

[x] Cross-cluster headless services: https://github.com/istio/istio/pull/35883
[x] Move tooling out of x/experimental: https://github.com/istio/istio/pull/35509
[x] Allow selecting on cluster id in destination rule: https://github.com/istio/istio/pull/27971
[x] Remove the 2x watches on multi-cluster secrets: https://github.com/istio/istio/pull/35674 Documentation:
[x] Cover the options for making a service "cluster-local" without relying on meshConfig.servicesettings
[x] Better explain the concept of "namespace sameness" (SIG Multicluster statement)
[x] Add instructions to run bug-report per-cluster on https://istio.io/latest/docs/ops/diagnostic-tools/multicluster/
[ ] Highlight in feature status that "remote clusters" and "multi network" have separate statuses.

frankbu commented 3 years ago

Shouldn't this be working and documented with istiodless remotes before promoting?

frankbu commented 3 years ago

Also oddly the doc still says this:

even though it seems to be using the default profile on remote clusters now.

stevenctl commented 3 years ago

Highlight in feature status that "remote clusters" and "multi network" have separate statuses.

In ToC and environments we discussed that we want to promote primary-remote separately.

stevenctl commented 2 years ago

This will miss another release unfortunately, but for good reason.

We need better behavior when updating a secret. Any modification to how we access a cluster means we send an EDS update removing all the endpoints to that cluster, then re-adding them. This is not what we want.

Improve support for removing a cluster, or replacing it's remote secret

stevenctl commented 2 years ago

This has largely fallen off my radar. I will attempt to help any engineers interested in driving the last few tasks over the finish line.

john-a-joyce commented 2 years ago

@stevenctl - I may have some capacity to help one this. Is this the full list of what we think is needed? Is the primary blocking item improved support for replacing a remote secret? Is there any WIP code or doc changes you are aware of other whats in this pull?

john-a-joyce commented 1 year ago

@stevenctl - thanks for the google doc - I spend some time looking at this and I agree the original implementation to delete the cluster is re-add a cluster is bad. It is especially weak in the case that cluster certificates are rotated and is the cause of the update. I can't remember why I choose such a lazy implementation originally. If I understand your proposal correctly I think you are suggesting we should add a new way to interact with XDSUpdater. I am not convinced that is the best approach. In most ways the remote cluster should not be treated differently than the local cluster. The primary reason a remote secret would be updated is to handle cert rotation. That same operation can occur in the local cluster. IMO we should code this update path the same way local cluster cert rotation is handled. Which essentially should just require that that the clients be updated. Unfortunately, I was not able to easily find how Istio handles that case.

stevenctl commented 1 year ago

Which essentially should just require that that the clients be updated. Unfortunately, I was not able to easily find how Istio handles that case.

Swapping out the kubeclient might be problematic. We'd need to be confident in our behavior for:

Handling missed events. Some endpoints/services might be deleted; I don't think all events will be replayed.
Handling duplicate events or "going back in time" where we handle old events again, then eventually catch back up.

Do we re-run the SyncAll logic? If so, do we diff with our current state to make sure we cleanup anything that might be missing?

The proposal to leverage XDSUpdater was so that we can spin up a new client, run our k8s event handlers and only once its ready start using that info for XDS generation.

Ideally, we have 2 clients/sets of controllers for these clusters around during the transition and have a cut over of some sort.

My perspective is that

kube client -> some controller(s) -> XDSUpdater that stores internal state -> config gen

XDSUpdater seems like a good place to do that swap and possibly some comparison to cleanup things that aren't in the cluster anymore (as far as the new client is concerned).

john-a-joyce commented 1 year ago

I guess my main point is what ever design we come up should apply equally for the local cluster and remote clusters. The most obvious reason for an update is cert rotation and that applies for both the local and remote API server. I thought there might already be logic or at least a specific update procedure for the local but I have not been able to find anything and posted a question on the general slack channel to see if anyone knows of one.

john-a-joyce commented 1 year ago

I did a bit more testing of the secret update path as well as some profiling of the code under different scenarios and I don't believe any change is needed for the update. The current code will call: https://github.com/istio/istio/blob/3629470718ed1edab8bf570a7a30fdbf9bddadf7/pkg/kube/multicluster/secretcontroller.go#L372 - which will stop all the kube clients and the informers. That is followed by: https://github.com/istio/istio/blob/3629470718ed1edab8bf570a7a30fdbf9bddadf7/pilot/pkg/serviceregistry/kube/controller/multicluster.go#L166 - which will removes all the clients and the informers from the controller. Then an add happens to re-create the clients and re-set up the informers. The

With some profiling I confirmed a few things that convinces me we don't need to change the behavior

Since the update path never calls: https://github.com/istio/istio/blob/3629470718ed1edab8bf570a7a30fdbf9bddadf7/pilot/pkg/serviceregistry/kube/controller/multicluster.go#L185 - the envoy XDS information is not affected.
After the m.deleteCluster(cluster.ID) no events from the remote cluster will affect the local state (e.g. xDS doesn't change)
After the new clients are added and informers updated the cache will go through a re-sync.
If the there are new clusters, endpoints, ect. in the new cache compared to what was previously there the sync logic will properly add and delete (e.g. after the sync the xDS state matches what is currently in the remote cluster)
I don't see any evidence that there is a Full push after the sync so all though the control plane may do a lot of processing due to the cache sync the overall system should not be much affected.

We could add a integration test to simulate some of this, but it will be both complicated and slow so I don't think it is worth when an remote secret update to really only occur due to security related rotations.

howardjohn commented 1 year ago

If the there are new clusters, endpoints, ect. in the new cache compared to what was previously there the sync logic will properly add and delete (e.g. after the sync the xDS state matches what is currently in the remote cluster)

In a new infomrer starting we would only get Add events. How do we delete?

john-a-joyce commented 1 year ago

@howardjohn - maybe I need to re-check my tests/results but with the cache going through a HasSynced transition I was seeing stale endpoints being deleted and new endpoints being added. Not being familiar with all the code that generates the xDS for envoy I wasn't sure exactly how that happened, I assumed the logic was treating the new caches as the source of truth and updating the dataplane state accordingly. I will double check things.

As we discussed in the weekly meeting prior to the holiday this remote update is analogous to rotating the certs in the local cluster. I want to avoid having a lot of complexity in our code for an operation that requires a lot of coordination at the operational level for all of the cluster.

Adding an update I retested using a new sequence: 3 kind clusters - Primary, Remote1 (named cluster2) and Remote2 (also named cluster2). Created original secret based on remote1. Updated secret to use remote2 kubeconfig. All the pertinent envoy listeners, clusters and endpoints are based on remote2 and there is nothing from remote1. I only had pods and services didn't try to include workload entries or other resource types.

stevenctl commented 1 year ago

We have a Cleanup method on kube controller that removes the endpointshards. The problem is the time between those shards being removed and them being repopulated.

Even though we don't trigger a push, it's possible that one is triggered elsewhere.

In the tests, we should make sure we see what the behavior is when the new cluster is slow to sync.

howardjohn commented 1 year ago

Can we lock everything, then do old.List() and new.List(). Then for each object, send an event either Delete(old), Add(new), or Update(old, new) depending on whether the object was in old, new, or both?

Note I think locking everything during this process is critical or we may get invalid cross-cluster resource selection

john-a-joyce commented 1 year ago

I hope to have a WIP PR this week with a solution and we can use that to debate implementations on how to solve this.

MorrisLaw commented 1 year ago

Hey, the enhancement subgroup has decided to spend time going through open PRs within the enhancement repo. This is in an effort to understand what issues may exist in the graduation process and to highlight any patterns noticed when it comes to blockers on these PRs (we're mainly curious about the friction that may exist when trying to graduate features).

With that being said, is there any follow up work that is in progress for this? It seems like there are still some unaddressed feedback for this PR @stevenctl @john-a-joyce

craigbox commented 8 months ago

@stevenctl Hi there

stevenctl commented 8 months ago

The work John has started should ideally land before calling this stable. I'm not sure I'm fully satisfied with our current architecture, but I don't think it's something we'd ever fully drop support for.

Another potential model, especially as we move towards ambient, could be control planes talking to eachother directly to share WDS objects.

I think we need to narrow/separate the scope of what's being called stable: Istiod reading other clusters config from a remote API server... effectively "multi-primary".

"Primary -remote" means we're can depend on reaching Istiod in another cluster via a gateway. That's a pretty separate thing imo.

istio / enhancements

promote multicluster to stable in 1.13 #107