rancherfederal / rke2-aws-tf

MIT License
84 stars 68 forks source link

RKE2 Cluster creation fails with private subnets #101

Closed Pelmorex closed 10 months ago

Pelmorex commented 11 months ago

When supplying a list of private subnets to the "subnets" field, the resulting nodes will never join each other as the communication seems to break through the load balancer. The module logic for the lb uses the private subnets. Perhaps I am missing something, but this module seems to only work for public subnets.

My use (perhaps I am doing something wrong here):

` module "rke2" { source = "git::https://github.com/rancherfederal/rke2-aws-tf.git" cluster_name = local.cluster_name unique_suffix = false vpc_id = module.vpc.vpc_id

subnets = module.vpc.public_subnets

subnets = module.vpc.private_subnets ami = data.aws_ami.ubuntu.id

enable_ccm = true

ssh_authorized_keys = [tls_private_key.ssh_keygen.public_key_openssh] instance_type = local.rke2_instance_type controlplane_internal = false servers = local.rke2_servers

associate_public_ip_address = true

controlplane_enable_cross_zone_load_balancing = true extra_security_group_ids = [aws_security_group.nlb_ui_sg.id] metadata_options = { http_endpoint = "enabled" http_tokens = "optional" instance_metadata_tags = "disabled" http_put_response_hop_limit = 1 } rke2_version = local.rke2_version

rke2_config = <<-EOT node-label:

} `

adamacosta commented 10 months ago

From everything I can tell given AWS documentation and claims of other people on the Internet, this should work and I'm at a loss figuring out why it doesn't. Even totally opening up the security groups to allow all traffic from anywhere made no difference. I might try to see if there is anything to figure out from a simpler scenario tomorrow or in the coming week, just running a single nginx server on an EC2 with a private IP pointed to by a load balancer with a public IP. The health check still works. It just isn't forwarding traffic even when it has a healthy instance. If the simpler scenario still doesn't work, I'm not sure what else to say. AWS doesn't document super-well how to handle all possible network configurations.

adamacosta commented 10 months ago

It's the fact that we're putting the load balancer and cluster nodes in the same subnets. If those are private subnets, the load balancer can't be accessed from the Internet, even though it has a public IP address or addresses. They're misleading and don't actually go anywhere. The same seems to go for Elastic IPs. Whatever they point to has to be in a public subnet or it won't be reachable.

We'll have to modify the module to accept a separate subnet list for the load balancer. It'll be up to the user, but for an Internet-facing load balancer scheme, these will have to be public subnets.

Pelmorex commented 10 months ago

Thank you adamacosta for digging into it! We reached the same conclusion about the subnet location of the LBs. We look forward to the update with public and private subnet support for the LB.

adamacosta commented 10 months ago

Fixed by #102