Missing instructions to install aws-cloud-controller-manager

maiconrocha commented 10 months ago

/kind bug

What steps did you take and what happened: There is no instructions on the documentation https://cluster-api-aws.sigs.k8s.io/getting-started#install-a-cloud-provider to install AWS Cloud Provider, only Azure

What did you expect to happen: I was helping a customer to follow instructions from the blog post:

https://aws.amazon.com/blogs/containers/multi-cluster-management-for-kubernetes-with-cluster-api-and-argo-cd/

The blog post provides instructions to generate EC2 Cluster Template: Like for example:

clusterctl generate cluster capi-ec2 --kubernetes-version v1.28.0 --control-plane-machine-count=3 --worker-machine-count=3 > ./capi-cluster/aws-ec2/aws-ec2.yaml

When deploying the cluster, I noticed the worker nodes were provisioned, but the cluster is stuck waiting for "WaitingForAvailableMachines"

clusterctl describe cluster capi-ec2

NAME                                                         READY  SEVERITY  REASON                       SINCE  MESSAGE
Cluster/capi-ec2                                             False  Warning   ScalingUp                    26m    Scaling up control plane to 3 replicas (actual 1)
├─ClusterInfrastructure - AWSCluster/capi-ec2                True                                          26m
├─ControlPlane - KubeadmControlPlane/capi-ec2-control-plane  False  Warning   ScalingUp                    26m    Scaling up control plane to 3 replicas (actual 1)
│ └─Machine/capi-ec2-control-plane-8s68s                     True                                          26m
└─Workers
  └─MachineDeployment/capi-ec2-md-0                          False  Warning   WaitingForAvailableMachines  30m    Minimum availability requires 3 replicas, current 0 available
    └─3 Machines...                                          False  Info      WaitingForBootstrapData      26m    See capi-ec2-md-0-tsl6z-5xctf, capi-ec2-md-0-tsl6z-8rxfj, ...

Investigating the issue further, I found out the node has untolerated taint

Warning FailedScheduling 115s (x62 over 5h7m) default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 3 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..

Same issue has been reported here: https://github.com/kubernetes-sigs/cluster-api/issues/9151

The recommendation was to install: aws-cloud-controller-manager on the comment: https://github.com/kubernetes-sigs/cluster-api/issues/9151#issuecomment-1671878672

However there is no instructions on the documentation https://cluster-api-aws.sigs.k8s.io/getting-started#install-a-cloud-provider to install AWS Cloud Provider, only Azure.

Ankitasw commented 10 months ago

/triage accepted /priority important-soon

kranurag7 commented 10 months ago

@maiconrocha looks like the aws cloud controller manager is only published as a helm package as a release asset in GitHub. https://github.com/kubernetes/cloud-provider-aws/releases/tag/helm-chart-aws-cloud-controller-manager-0.0.8

As of now, since it's not hosted somewhere it's a two step process.

$ curl -LO https://github.com/kubernetes/cloud-provider-aws/releases/download/helm-chart-aws-cloud-controller-manager-0.0.8/aws-cloud-controller-manager-0.0.8.tgz
$ helm template aws-cloud-controller-manager-0.0.8.tgz | kubectl apply -f -

Let me know if there's an alternative to handle this.

/assign

k8s-triage-robot commented 7 months ago

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Deprioritize it with /priority important-longterm or /priority backlog
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kubernetes-sigs / cluster-api-provider-aws

Missing instructions to install aws-cloud-controller-manager #4715