Closed sirao closed 7 months ago
There is a similar issue tracked against CAPA here: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969
I do think this is something that we should probably validate and try to guard against in CAPI rather than CAPA, though.
@liztio we should probably ensure that we are validating this as part of the validating webhooks work you are doing.
@detiber shall we provide a concept table to show what is CAPI, CAPA? I assume CAPI is cluster api provider interface and CAPA is cluster-api-provider-aws?
Ah, I saw the glossary at https://cluster-api.sigs.k8s.io/reference/glossary.html#c , thanks @detiber
I tried to reproduce this issue but ended up in a really weird state. I created two clusters but messed up the SSHKeyName so I had to delete them both and try again. But I ended up not being able to delete the duplicate cluster. You can see the state of my system here:
https://cloud.tilt.dev/snapshot/AfTE99wLlyPeKZAVyl8=
Scroll up just a bit to see some highlighted lines.
@chuckha were you ever able to make any more progress on this?
As I mentioned in the PR linked above, it is not possible in AWS to create two clusters in different namespaces with the same name. I have not attempted to fix the problem beyond the PoC linked above. I suspect there may be other components that do not respect name/namespace as primary key and only look at the name, but that should be easy enough to figure out with some dedicated testing and poking.
I'd suggest, for anyone looking to get involved here, to create a cluster, take inventory of all items that exist, then make another cluster with the same name in a different namespace and make sure all the components expected to exist, exist. Then make sure the cluster actually came up.
Going through open unassigned issues in the v0.3.0 milestone. We have a decent amount of work left to do on features (control plane, clusterctl, etc). While this is an unfortunately ugly bug, I think we need to defer it to v0.4.
/milestone Next /remove-priority important-soon /priority important-longterm
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen
This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?
/help
@vincepri: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?
It depends? The issue is complicated quite a bit by the kubernetes cloud provider integration, which requires a unique cluster name as well.
I think we have a couple of paths we can take here:
I believe the second path is probably the right one to take longer term, however it's also a far from trivial challenge to implement across the various providers in a backwards compatible way.
~Also if we create a cluster with the same name in two different namespaces. The kubeconfig and other secrets in the default namespace get overwritten. This way you lose all certs for the first cluster.~
~One solution would be to keep all secrets scoped to the specific cluster namespace. Ideally each cluster should have its own namespace.~
UPDATE: https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-634123240
@lomkju the secrets managed by Cluster API are scoped to the cluster's namespace. Are you seeing some different behavior?
@ncdc After testing this again, I found that the secrets are indeed created in the respective namespace. I was wrong in the above comment. But actually the problem is that if we create clusters with the same name in different namespaces the same ELB is used for both masters. That's why I'm getting the below error sometimes because requests are being sent to the other master which is using a different CA.
ClusterAPI is like trying to use the same AWS resources for both clusters. (VPC, ELB, IAM ...)
➜ k describe node ip-10-0-0-161.ap-south-1.compute.internal
error: You must be logged in to the server (Unauthorized)
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/lifecycle frozen
During backlog grooming, @detiber proposed to introduce a contract for our infrastructure providers to at least use the namespaced name and document these limitations.
/cc @randomvariable @CecileRobertMichon
FYI, in CAPD we already faced problems due to the length of the machine names (see https://github.com/kubernetes-sigs/cluster-api/issues/3599), so the idea of concatenating
Finding a good trade-off between shortness, uniqueness, and meaningfulness of names is the first challenge here. The second one is to ensure a viable upgrade path for existing infrastructure if the naming scheme changes for infrastructure components changes.
/kind documentation /assign @randomvariable to document the contract guidelines for providers
/milestone v1.1
/assign @yastij to reassess
/triage accepted This is up to providers, if we want this to happen the topic should be raised in the office hours and an agreement between provider implementer should be reached
/help
Is this issue specific to capa? Can capz or capv create clusters of the same name in different namespaces?
CAPA was the only ask for me to test. Haven't verified on capz or capv.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
/triage accepted
(org members only)/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/close
This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.
We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it
@fabriziopandini: Closing this issue.
What steps did you take and what happened: Execute below commands:
after come times cluster capi-quickstart is in PROVISIONED phase but no infra created. And deletion of the cluster also fails. What did you expect to happen: cluster with same name shouldn't be allowed or security groups naming convention should be changed so that for every cluster those get created
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] CAPA-log: