kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.57k stars 1.31k forks source link

cluster with same name under different namespace is provisioned but no infra created #1554

Closed sirao closed 7 months ago

sirao commented 5 years ago

What steps did you take and what happened: Execute below commands:

  1. kind create cluster --name=test-mc
  2. export KUBECONFIG="$(kind get kubeconfig-path --name="clusterapi1")"
  3. kubectl create -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.4/cluster-api-components.yaml
  4. kubectl create -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.0/bootstrap-components.yaml
  5. clusterawsadm alpha bootstrap create-stack
  6. aws ssm put-parameter --name "/sigs.k8s.io/cluster-api-provider-aws/ssh-key" --type SecureString --value "$(aws ec2 create-key-pair --key-name default | jq .KeyMaterial -r)"
  7. export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io)
  8. export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
  9. export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
  10. export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
  11. curl -L https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.4.2/infrastructure-components.yaml | envsubst | kubectl create -f -
  12. kubectl apply -f cluster.yaml where cluster.yaml contents are below
    apiVersion: cluster.x-k8s.io/v1alpha2
    kind: Cluster
    metadata:
    name: capi-quickstart
    spec:
    clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    name: capi-quickstart
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    metadata:
    name: capi-quickstart
    spec:
    # Change this value to the region you want to deploy the cluster in.
    region: us-east-2
    # Change this value to a valid SSH Key Pair present in your AWS Account.
    sshKeyName: default
  13. wait till the cluster PHASE is provisioned and verify basic infra created in AWS cluster
  14. kubectl create namespace duplicate-cluster
  15. kubectl apply -f dup-cluster.yaml where the yaml contents are below
    apiVersion: cluster.x-k8s.io/v1alpha2
    kind: Cluster
    metadata:
    name: capi-quickstart
    namespace: duplicate-cluster
    spec:
    clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    name: capi-quickstart
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
    kind: AWSCluster
    metadata:
    name: capi-quickstart
    namespace: duplicate-cluster
    spec:
    # Change this value to the region you want to deploy the cluster in.
    region: us-east-2
    # Change this value to a valid SSH Key Pair present in your AWS Account.
    sshKeyName: default

after come times cluster capi-quickstart is in PROVISIONED phase but no infra created. And deletion of the cluster also fails. What did you expect to happen: cluster with same name shouldn't be allowed or security groups naming convention should be changed so that for every cluster those get created

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] CAPA-log:


I1016 05:21:14.332092       1 awscluster_controller.go:69] controllers/AWSCluster "msg"="Cluster Controller has not yet set OwnerRef" "awsCluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:14.346281       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:21:19.918843       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:26:00.620580       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:26:00.620881       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:26:06.073879       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-qs1" "cluster"="capi-qs1" "namespace"="test-kr" 
I1016 05:35:59.931933       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="default" 
I1016 05:36:05.393440       1 awscluster_controller.go:130] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="capi-quickstart" "cluster"="capi-quickstart" "namespace"="duplicate-cluster"```

**Environment:**

- Cluster-api version: v0.2.4
- Minikube/KIND version: v0.5.1
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`): darwin x86_64

/kind bug
detiber commented 5 years ago

There is a similar issue tracked against CAPA here: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/969

I do think this is something that we should probably validate and try to guard against in CAPI rather than CAPA, though.

detiber commented 5 years ago

@liztio we should probably ensure that we are validating this as part of the validating webhooks work you are doing.

gyliu513 commented 5 years ago

@detiber shall we provide a concept table to show what is CAPI, CAPA? I assume CAPI is cluster api provider interface and CAPA is cluster-api-provider-aws?

gyliu513 commented 5 years ago

Ah, I saw the glossary at https://cluster-api.sigs.k8s.io/reference/glossary.html#c , thanks @detiber

chuckha commented 4 years ago

I tried to reproduce this issue but ended up in a really weird state. I created two clusters but messed up the SSHKeyName so I had to delete them both and try again. But I ended up not being able to delete the duplicate cluster. You can see the state of my system here:

https://cloud.tilt.dev/snapshot/AfTE99wLlyPeKZAVyl8=

Scroll up just a bit to see some highlighted lines.

ncdc commented 4 years ago

@chuckha were you ever able to make any more progress on this?

chuckha commented 4 years ago

As I mentioned in the PR linked above, it is not possible in AWS to create two clusters in different namespaces with the same name. I have not attempted to fix the problem beyond the PoC linked above. I suspect there may be other components that do not respect name/namespace as primary key and only look at the name, but that should be easy enough to figure out with some dedicated testing and poking.

I'd suggest, for anyone looking to get involved here, to create a cluster, take inventory of all items that exist, then make another cluster with the same name in a different namespace and make sure all the components expected to exist, exist. Then make sure the cluster actually came up.

ncdc commented 4 years ago

Going through open unassigned issues in the v0.3.0 milestone. We have a decent amount of work left to do on features (control plane, clusterctl, etc). While this is an unfortunately ugly bug, I think we need to defer it to v0.4.

/milestone Next /remove-priority important-soon /priority important-longterm

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

vincepri commented 4 years ago

/lifecycle frozen

vincepri commented 4 years ago

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

/help

k8s-ci-robot commented 4 years ago

@vincepri: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1554): >This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts? > >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
detiber commented 4 years ago

This issue seems mostly a documentation / infrastructure limitation, @detiber thoughts?

It depends? The issue is complicated quite a bit by the kubernetes cloud provider integration, which requires a unique cluster name as well.

I think we have a couple of paths we can take here:

I believe the second path is probably the right one to take longer term, however it's also a far from trivial challenge to implement across the various providers in a backwards compatible way.

lomkju commented 4 years ago

~Also if we create a cluster with the same name in two different namespaces. The kubeconfig and other secrets in the default namespace get overwritten. This way you lose all certs for the first cluster.~

~One solution would be to keep all secrets scoped to the specific cluster namespace. Ideally each cluster should have its own namespace.~

UPDATE: https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-634123240

ncdc commented 4 years ago

@lomkju the secrets managed by Cluster API are scoped to the cluster's namespace. Are you seeing some different behavior?

lomkju commented 4 years ago

@ncdc After testing this again, I found that the secrets are indeed created in the respective namespace. I was wrong in the above comment. But actually the problem is that if we create clusters with the same name in different namespaces the same ELB is used for both masters. That's why I'm getting the below error sometimes because requests are being sent to the other master which is using a different CA.

ClusterAPI is like trying to use the same AWS resources for both clusters. (VPC, ELB, IAM ...)

➜ k describe node ip-10-0-0-161.ap-south-1.compute.internal
error: You must be logged in to the server (Unauthorized)
fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

detiber commented 4 years ago

/lifecycle frozen

vincepri commented 4 years ago

During backlog grooming, @detiber proposed to introduce a contract for our infrastructure providers to at least use the namespaced name and document these limitations.

/cc @randomvariable @CecileRobertMichon

fabriziopandini commented 4 years ago

FYI, in CAPD we already faced problems due to the length of the machine names (see https://github.com/kubernetes-sigs/cluster-api/issues/3599), so the idea of concatenating -- could lead to problems.

Finding a good trade-off between shortness, uniqueness, and meaningfulness of names is the first challenge here. The second one is to ensure a viable upgrade path for existing infrastructure if the naming scheme changes for infrastructure components changes.

vincepri commented 3 years ago

/kind documentation /assign @randomvariable to document the contract guidelines for providers

vincepri commented 3 years ago

/milestone v1.1

sbueringer commented 2 years ago

/assign @yastij to reassess

fabriziopandini commented 2 years ago

/triage accepted This is up to providers, if we want this to happen the topic should be raised in the office hours and an agreement between provider implementer should be reached

fabriziopandini commented 2 years ago

/help

YanzhaoLi commented 1 year ago

Is this issue specific to capa? Can capz or capv create clusters of the same name in different namespaces?

sirao commented 1 year ago

CAPA was the only ask for me to test. Haven't verified on capz or capv.

k8s-triage-robot commented 8 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 7 months ago

/close

This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort.

We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it

k8s-ci-robot commented 7 months ago

@fabriziopandini: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1554#issuecomment-2027253628): >/close > >This issue has not been updated in over 1 year and to make this happen consistently across all the providers it is required a wide consensus + a strong push/someone investing time in this effort. > >We can re-open (or re-create) whenever there are the conditions and the required community consensus to work on it Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.