Closed stevenctl closed 6 months ago
Shouldn't this be working and documented with istiodless remotes before promoting?
Also oddly the doc still says this:
even though it seems to be using the default profile on remote clusters now.
Highlight in feature status that "remote clusters" and "multi network" have separate statuses.
In ToC and environments we discussed that we want to promote primary-remote separately.
This will miss another release unfortunately, but for good reason.
We need better behavior when updating a secret. Any modification to how we access a cluster means we send an EDS update removing all the endpoints to that cluster, then re-adding them. This is not what we want.
Improve support for removing a cluster, or replacing it's remote secret
This has largely fallen off my radar. I will attempt to help any engineers interested in driving the last few tasks over the finish line.
@stevenctl - I may have some capacity to help one this. Is this the full list of what we think is needed? Is the primary blocking item improved support for replacing a remote secret? Is there any WIP code or doc changes you are aware of other whats in this pull?
@stevenctl - thanks for the google doc - I spend some time looking at this and I agree the original implementation to delete the cluster is re-add a cluster is bad. It is especially weak in the case that cluster certificates are rotated and is the cause of the update. I can't remember why I choose such a lazy implementation originally. If I understand your proposal correctly I think you are suggesting we should add a new way to interact with XDSUpdater. I am not convinced that is the best approach. In most ways the remote cluster should not be treated differently than the local cluster. The primary reason a remote secret would be updated is to handle cert rotation. That same operation can occur in the local cluster. IMO we should code this update path the same way local cluster cert rotation is handled. Which essentially should just require that that the clients be updated. Unfortunately, I was not able to easily find how Istio handles that case.
Which essentially should just require that that the clients be updated. Unfortunately, I was not able to easily find how Istio handles that case.
Swapping out the kubeclient might be problematic. We'd need to be confident in our behavior for:
Do we re-run the SyncAll
logic? If so, do we diff with our current state to make sure we cleanup anything that might be missing?
The proposal to leverage XDSUpdater was so that we can spin up a new client, run our k8s event handlers and only once its ready start using that info for XDS generation.
Ideally, we have 2 clients/sets of controllers for these clusters around during the transition and have a cut over of some sort.
My perspective is that
kube client -> some controller(s) -> XDSUpdater that stores internal state -> config gen
XDSUpdater seems like a good place to do that swap and possibly some comparison to cleanup things that aren't in the cluster anymore (as far as the new client is concerned).
I guess my main point is what ever design we come up should apply equally for the local cluster and remote clusters. The most obvious reason for an update is cert rotation and that applies for both the local and remote API server. I thought there might already be logic or at least a specific update procedure for the local but I have not been able to find anything and posted a question on the general slack channel to see if anyone knows of one.
I did a bit more testing of the secret update path as well as some profiling of the code under different scenarios and I don't believe any change is needed for the update. The current code will call: https://github.com/istio/istio/blob/3629470718ed1edab8bf570a7a30fdbf9bddadf7/pkg/kube/multicluster/secretcontroller.go#L372 - which will stop all the kube clients and the informers. That is followed by: https://github.com/istio/istio/blob/3629470718ed1edab8bf570a7a30fdbf9bddadf7/pilot/pkg/serviceregistry/kube/controller/multicluster.go#L166 - which will removes all the clients and the informers from the controller. Then an add happens to re-create the clients and re-set up the informers. The
With some profiling I confirmed a few things that convinces me we don't need to change the behavior
We could add a integration test to simulate some of this, but it will be both complicated and slow so I don't think it is worth when an remote secret update to really only occur due to security related rotations.
If the there are new clusters, endpoints, ect. in the new cache compared to what was previously there the sync logic will properly add and delete (e.g. after the sync the xDS state matches what is currently in the remote cluster)
In a new infomrer starting we would only get Add events. How do we delete?
@howardjohn - maybe I need to re-check my tests/results but with the cache going through a HasSynced transition I was seeing stale endpoints being deleted and new endpoints being added. Not being familiar with all the code that generates the xDS for envoy I wasn't sure exactly how that happened, I assumed the logic was treating the new caches as the source of truth and updating the dataplane state accordingly. I will double check things.
As we discussed in the weekly meeting prior to the holiday this remote update is analogous to rotating the certs in the local cluster. I want to avoid having a lot of complexity in our code for an operation that requires a lot of coordination at the operational level for all of the cluster.
Adding an update I retested using a new sequence: 3 kind clusters - Primary, Remote1 (named cluster2) and Remote2 (also named cluster2). Created original secret based on remote1. Updated secret to use remote2 kubeconfig. All the pertinent envoy listeners, clusters and endpoints are based on remote2 and there is nothing from remote1. I only had pods and services didn't try to include workload entries or other resource types.
We have a Cleanup method on kube controller that removes the endpointshards. The problem is the time between those shards being removed and them being repopulated.
Even though we don't trigger a push, it's possible that one is triggered elsewhere.
In the tests, we should make sure we see what the behavior is when the new cluster is slow to sync.
Can we lock everything, then do old.List() and new.List(). Then for each object, send an event either Delete(old)
, Add(new)
, or Update(old, new)
depending on whether the object was in old, new, or both?
Note I think locking everything during this process is critical or we may get invalid cross-cluster resource selection
I hope to have a WIP PR this week with a solution and we can use that to debate implementations on how to solve this.
Hey, the enhancement subgroup has decided to spend time going through open PRs within the enhancement repo. This is in an effort to understand what issues may exist in the graduation process and to highlight any patterns noticed when it comes to blockers on these PRs (we're mainly curious about the friction that may exist when trying to graduate features).
With that being said, is there any follow up work that is in progress for this? It seems like there are still some unaddressed feedback for this PR @stevenctl @john-a-joyce
@stevenctl Hi there
The work John has started should ideally land before calling this stable. I'm not sure I'm fully satisfied with our current architecture, but I don't think it's something we'd ever fully drop support for.
Another potential model, especially as we move towards ambient, could be control planes talking to eachother directly to share WDS objects.
I think we need to narrow/separate the scope of what's being called stable: Istiod reading other clusters config from a remote API server... effectively "multi-primary".
"Primary -remote" means we're can depend on reaching Istiod in another cluster via a gateway. That's a pretty separate thing imo.
Testing:
Features/improvements (some of these were added prior to 1.12, but after the beta promotion):
x/experimental
: https://github.com/istio/istio/pull/35509meshConfig.servicesettings