Closed francardoso93 closed 2 years ago
similar to https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1748
you'll need to add the port you are using with the metrics server (443, 4443, etc.)
Thanks @bryantbiggs, that was it!
Sharing my solution:
node_security_group_additional_rules = {
metrics_server_8443_ing = {
description = "Cluster API to metrics server 8443 ingress port"
protocol = "tcp"
from_port = 8443
to_port = 8443
type = "ingress"
source_cluster_security_group = true
}
metrics_server_10250_ing = {
description = "Node to node metrics server 10250 ingress port"
protocol = "tcp"
from_port = 10250
to_port = 10250
type = "ingress"
self = true
}
metrics_server_10250_eg = {
description = "Node to node metrics server 10250 egress port"
protocol = "tcp"
from_port = 10250
to_port = 10250
type = "egress"
self = true # Does not work for fargate
}
}
Also, creating those new resources are required if working with Fargate nodes (So metrics Server can scrape from them):
resource "aws_security_group_rule" "fargate_ingress" {
description = "Node to cluster - Fargate kubelet (Required for Metrics Server)"
type = "ingress"
from_port = 10250
to_port = 10250
protocol = "tcp"
source_security_group_id = module.<eks-module-name>.node_security_group_id
security_group_id = module.<eks-module-name>.cluster_primary_security_group_id
}
resource "aws_security_group_rule" "fargate_egress" {
description = "Node to cluster - Fargate kubelet (Required for Metrics Server)"
protocol = "tcp"
from_port = 10250
to_port = 10250
type = "egress"
source_security_group_id = module.<eks-module-name>.cluster_primary_security_group_id
security_group_id = module.<eks-module-name>.node_security_group_id
}
That is because Fargate nodes are assigned with the cluster_primary_security_group_id
security group.
Thanks @francardoso93 I had the same issue and this solved it for me. π
Thank you @francardoso93, you saved my day ππ»
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Description
After upgrading from module version "17.24.0" to "~> 18.2.0", metrics server is now starting with
FailedDiscoveryCheck
error.Versions
Terraform: v1.0.11 Provider(s): AWS: 3.72.0 Module: eks 18.2.1
Reproduction
Code Snippet to Reproduce
....
Expected behavior
kubectl top pods
orkubectl top nodes
should work properlyActual behavior
Terminal Output Screenshot(s)
IMPORTANT: After reading this solution I have tried to add inbound access for 0.0.0.0/0 to the blank node security group created by this module (for testing purposes of course) AND IT WORKED! So my current understanding is that there is a communication issue between nodes and control plane here that affected metrics server caused by how security groups are now being configured OR I might be missing some
node_security_group_additional_rules
.Additional context
deployment
directly. Same results. Using latest version.