cluster with same name under different namespace is provisioned but no infra created

sirao commented 5 years ago

What steps did you take and what happened: Execute below commands:

kind create cluster --name=test-mc
export KUBECONFIG="$(kind get kubeconfig-path --name="clusterapi1")"
kubectl create -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.4/cluster-api-components.yaml
kubectl create -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.0/bootstrap-components.yaml
clusterawsadm alpha bootstrap create-stack
aws ssm put-parameter --name "/sigs.k8s.io/cluster-api-provider-aws/ssh-key" --type SecureString --value "$(aws ec2 create-key-pair --key-name default | jq .KeyMaterial -r)"
export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io)
export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.4.2/infrastructure-components.yaml | envsubst | kubectl create -f -

kubectl apply -f cluster.yaml where cluster.yaml contents are below

apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: capi-quickstart
spec:
clusterNetwork:
pods:
  cidrBlocks: ["192.168.0.0/16"]
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
name: capi-quickstart
spec:
# Change this value to the region you want to deploy the cluster in.
region: us-east-2
# Change this value to a valid SSH Key Pair present in your AWS Account.
sshKeyName: default

wait till the cluster PHASE is provisioned and verify basic infra created in AWS cluster
kubectl create namespace duplicate-cluster

kubectl apply -f dup-cluster.yaml where the yaml contents are below

apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
name: capi-quickstart
namespace: duplicate-cluster
spec:
clusterNetwork:
pods:
  cidrBlocks: ["192.168.0.0/16"]
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: AWSCluster
metadata:
name: capi-quickstart
namespace: duplicate-cluster
spec:
# Change this value to the region you want to deploy the cluster in.
region: us-east-2
# Change this value to a valid SSH Key Pair present in your AWS Account.
sshKeyName: default

after come times cluster capi-quickstart is in PROVISIONED phase but no infra created. And deletion of the cluster also fails. What did you expect to happen: cluster with same name shouldn't be allowed or security groups naming convention should be changed so that for every cluster those get created

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] CAPA-log:


I1016 05:21:14.332092       1 awscluster_controller.go:69] controllers/AWSCluster "msg"="Cluster Controller has not yet set OwnerRef" "awsCluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:14.346281       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:19.918843       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:26:00.620580       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:26:00.620881       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:26:06.073879       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:36:05.393440       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"```

**Environment:**

- Cluster-api version: v0.2.4
- Minikube/KIND version: v0.5.1
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`): darwin x86_64

/kind bug

detiber commented 5 years ago

There is a similar issue tracked against CAPA here: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969

I do think this is something that we should probably validate and try to guard against in CAPI rather than CAPA, though.

detiber commented 5 years ago

@liztio we should probably ensure that we are validating this as part of the validating webhooks work you are doing.

gyliu513 commented 5 years ago

@detiber shall we provide a concept table to show what is CAPI, CAPA? I assume CAPI is cluster api provider interface and CAPA is cluster-api-provider-aws?

gyliu513 commented 5 years ago

Ah, I saw the glossary at https://cluster-api.sigs.k8s.io/reference/glossary.html#c , thanks @detiber

chuckha commented 4 years ago

I tried to reproduce this issue but ended up in a really weird state. I created two clusters but messed up the SSHKeyName so I had to delete them both and try again. But I ended up not being able to delete the duplicate cluster. You can see the state of my system here:

https://cloud.tilt.dev/snapshot/AfTE99wLlyPeKZAVyl8=

Scroll up just a bit to see some highlighted lines.

ncdc commented 4 years ago

@chuckha were you ever able to make any more progress on this?

chuckha commented 4 years ago

As I mentioned in the PR linked above, it is not possible in AWS to create two clusters in different namespaces with the same name. I have not attempted to fix the problem beyond the PoC linked above. I suspect there may be other components that do not respect name/namespace as primary key and only look at the name, but that should be easy enough to figure out with some dedicated testing and poking.

I'd suggest, for anyone looking to get involved here, to create a cluster, take inventory of all items that exist, then make another cluster with the same name in a different namespace and make sure all the components expected to exist, exist. Then make sure the cluster actually came up.

ncdc commented 4 years ago

Going through open unassigned issues in the v0.3.0 milestone. We have a decent amount of work left to do on features (control plane, clusterctl, etc). While this is an unfortunately ugly bug, I think we need to defer it to v0.4.

/milestone Next /remove-priority important-soon /priority important-longterm

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

vincepri commented 4 years ago

/lifecycle frozen

vincepri commented 4 years ago

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

/help

k8s-ci-robot commented 4 years ago

@vincepri: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1554): >This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts? > >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

detiber commented 4 years ago

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

It depends? The issue is complicated quite a bit by the kubernetes cloud provider integration, which requires a unique cluster name as well.

I think we have a couple of paths we can take here:

Fully externalize the issue and document that it shouldn't be done
- This is what we've done to date
- Poor UX, could end up with competing reconciliations that could end with running workloads being affected when trying to spin up a new cluster with the same name/account/region.
- Enforcing uniqueness at the management cluster level (even across namespaces) still doesn't solve the full issue, since different management clusters could be using the same infrastructure accounts/regions.
Introduce some type of uniqueness that we inject into the bootstrapping config (and also consume by infrastructure providers) that tries to work around the cloud provider integration issue.
- Would probably provide the best UX
- However it would also introduce backwards compatibility challenges for existing workload clusters

I believe the second path is probably the right one to take longer term, however it's also a far from trivial challenge to implement across the various providers in a backwards compatible way.

lomkju commented 4 years ago

~Also if we create a cluster with the same name in two different namespaces. The kubeconfig and other secrets in the default namespace get overwritten. This way you lose all certs for the first cluster.~

~One solution would be to keep all secrets scoped to the specific cluster namespace. Ideally each cluster should have its own namespace.~

UPDATE: https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-634123240

ncdc commented 4 years ago

@lomkju the secrets managed by Cluster API are scoped to the cluster's namespace. Are you seeing some different behavior?

lomkju commented 4 years ago

@ncdc After testing this again, I found that the secrets are indeed created in the respective namespace. I was wrong in the above comment. But actually the problem is that if we create clusters with the same name in different namespaces the same ELB is used for both masters. That's why I'm getting the below error sometimes because requests are being sent to the other master which is using a different CA.

ClusterAPI is like trying to use the same AWS resources for both clusters. (VPC, ELB, IAM ...)

➜ k describe node ip-10-0-0-161.ap-south-1.compute.internal
error: You must be logged in to the server (Unauthorized)

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

detiber commented 4 years ago

/lifecycle frozen

vincepri commented 4 years ago

During backlog grooming, @detiber proposed to introduce a contract for our infrastructure providers to at least use the namespaced name and document these limitations.

/cc @randomvariable @CecileRobertMichon

fabriziopandini commented 4 years ago

FYI, in CAPD we already faced problems due to the length of the machine names (see https://github.com/kubernetes-sigs/cluster-api/issues/3599), so the idea of concatenating -- could lead to problems.

Finding a good trade-off between shortness, uniqueness, and meaningfulness of names is the first challenge here. The second one is to ensure a viable upgrade path for existing infrastructure if the naming scheme changes for infrastructure components changes.

vincepri commented 3 years ago

/kind documentation /assign @randomvariable to document the contract guidelines for providers

vincepri commented 3 years ago

/milestone v1.1

sbueringer commented 2 years ago

/assign @yastij to reassess

fabriziopandini commented 2 years ago

/triage accepted This is up to providers, if we want this to happen the topic should be raised in the office hours and an agreement between provider implementer should be reached

fabriziopandini commented 2 years ago

/help

YanzhaoLi commented 1 year ago

Is this issue specific to capa? Can capz or capv create clusters of the same name in different namespaces?

sirao commented 1 year ago

CAPA was the only ask for me to test. Haven't verified on capz or capv.

k8s-triage-robot commented 8 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 7 months ago

/close

This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

k8s-ci-robot commented 7 months ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-2027253628): >/close > >This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort. > >We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api

cluster with same name under different namespace is provisioned but no infra created #1554