rancherfederal / rke2-aws-tf

MIT License
84 stars 68 forks source link

Error: want at least 1 healthy instance(s) registered to Load Balancer, have 0', timeout: 10m0 #87

Closed kdiji closed 10 months ago

kdiji commented 1 year ago

It seems like the control plane nlb can't see the instances from the asg as healthy. Funny thing they all passed health status and health check. This is a block and would appreciate some help>

module.servers.aws_autoscaling_group.this: Still creating... [9m50s elapsed]
module.servers.aws_autoscaling_group.this: Still creating... [10m0s elapsed]
╷
│ Error: waiting for Auto Scaling Group (p1-il2-dev-nv-km3-server-rke2-nodepool) capacity satisfied: timeout while waiting for state to become 'ok' (last state: 'want at least 1 healthy instance(s) registered to Load Balancer, have 0', timeout: 10m0s)
│ 
│   with module.servers.aws_autoscaling_group.this,
│   on modules/nodepool/main.tf line 69, in resource "aws_autoscaling_group" "this":
│   69: resource "aws_autoscaling_group" "this" {
│ 
╵
Releasing state lock. This may take a few moments...
ERRO[0790] 1 error occurred:
        * exit status 1

The same issue has also been reported here => https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform/-/issues/5

ryan-mcd commented 1 year ago

@kdiji Are you still experiencing ASG failures? After an SELinux update, RKE2 could not be deployed with SELinux enabled for some time. See this issue for more information.

abhinavsingh1196 commented 1 year ago

@kdiji @ryan-mcd do you know if this issue was resolved? I am seeing the same behaviour when deploying to AWS.

adamacosta commented 10 months ago

I'm going to close this because every issue I am aware of should be resolved with both rke2 and anything to do with this module. We've run it now for customers in AWS commercial, GovCloud, C2S, with or without CIS-profile enabled, with or without SELinux, with or without host-level STIG-compliance, with or without host-level FIPS-mode enabled.

If anyone is still seeing an issue with nodes not joining the cluster, please investigate further why that is. If it's faulty logic in our initialization scripts, open an issue in this repo explaining what that logic error is and how you determined that is the problem. If it's an issue with rke2 itself not wanting to launch, open an issue in rancher/rke2.