Closed digitalronin closed 4 years ago
MANAGING MULTIPLE CLUSTERS
OVERVIEW
The scope of this ticket was to discuss the plan and strategy of what is highlighted in the ticket description. In particular in the scenario of replicating an existing cluster, how we handle and manage the 'interim' period when both the clusters co-exist before tearing down the replaced cluster. The following are the key points brought forward from discussion with wider team
The cluster being replaced needs to stay 'live' until the new cluster is tested as fully working.
The new cluster must replicate 'all' resources such as RDS secrets so that existing applications consuming those secrets do not break
The scope of replication is only for those resources provisioned using terraform's 'kubernetes provider'. Existing AWS resources should not be affected
The new cluster will be provisioned in the same VPC as the one being replaced.
OUTCOMES
To acheive and implement the above requirement it was disucssed that 'Kubernetes Federation' or 'Anthos' are viable solutions. However both currently have impediments, which we would need to address. When possible the next step would be to practically test the following:
PR approved / closed: https://github.com/ministryofjustice/cloud-platform/pull/2051
If we have applications running across multiple clusters (even if that's only temporarily, while we replace a cluster), how would we manage terraform state?
Currently, every namespace in the cluster has its own terraform statefile. Some of the items in the terraform state are AWS resources, which should stay the same from one cluster to another, but other things will be kubernetes objects which may need to be created in both/all clusters where the namespace/service exists.
Even for the AWS resources, such as RDS instances, how do we ensure that the kubernetes secrets containing the access credentials are available on all required clusters in the cloud platform?
How would this affect our concourse pipelines?