Open andrewsykim opened 5 years ago
/assign @mcrute
@mcrute is working on the initial design for this.
cc @cheftako
Here's a first draft. There's plenty more to be done but getting this out there for discussion.
Thanks @mcrute!
Thanks @mcrute https://github.com/mcrute! I would like us to also discuss as part of this how we do a better job of running Controllers in HA environments. Currently we do not utilize HA well as part of this. If we could get rid of the kill process when leader election is lost, then we could get much better utilization in HA. The problem has been that Controllers tend to kick of goroutines (and similar asynchronous processing). The problem is that Controller actions may not be idempotent. So we end up with mutations from something other than the main controller thread which did not get shut down (or at least shut down in a timely manner). One thought for this could be to attach an election token (or similar) to mutations. If the mutator is no longer leader, then the write is refused and the mutator is notified that they are no longer the leader (and should stop). While I believe is more than we need for the KCM->CCM migration, I would like us to consider it as where we are going. It would be good for us to make sure we are generally heading in that direction.
On Fri, Mar 1, 2019 at 10:41 AM Andrew Sy Kim notifications@github.com wrote:
Thanks @mcrute https://github.com/mcrute!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/cloud-provider/issues/11#issuecomment-468767347, or mute the thread https://github.com/notifications/unsubscribe-auth/AA53A-drkWbYBQ3TMM_J5azU7dY8Qhoyks5vSXRtgaJpZM4bD0lq .
/milestone v1.15 /priority critical-urgent
/assign
This is going to slip into the next release since we couldn't get the KEP reviewed in time for the KEP deadline. Further discussions happening for this in https://github.com/kubernetes/enhancements/pull/979 & https://github.com/kubernetes/kubernetes/pull/77878, hoping to have an implementable KEP in time for v1.16.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/assign @yastij
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
We need a KEP outlining how we intend to migrate existing clusters from using the kube-controller-manager to the cloud-controller-manager for the cloud provider specific parts of Kubernetes.
At KubeCON NA 2018, we discussed grouping the existing cloud controllers under 1 leader election that is shared by the kube-controller-manager and the cloud-controller-manager. For single node control planes this is not needed, but for HA control planes we need a mechanism to ensure that not more than 1 kube-controller-manager or cloud-controller-manager is running the set of cloud controllers in a cluster.