subspace / infra

7 stars 4 forks source link

add network L4 loadbalancer (NLB) to cluster and services for TCP/UDP #317

Closed DaMandal0rian closed 4 weeks ago

DaMandal0rian commented 1 month ago

PR Type

enhancement, configuration changes


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
2 files
nlb.tf
Add NLB, target groups, and security group rules for EKS (blue).

eks/eks-blue/nlb.tf
  • Added Network Load Balancer (NLB) resource.
  • Defined target groups and listeners for TCP and UDP ports.
  • Configured security group rules for EKS cluster.
  • +183/-0 
    nlb.tf
    Add NLB, target groups, and security group rules for EKS (green).

    eks/eks-green/nlb.tf
  • Added Network Load Balancer (NLB) resource.
  • Defined target groups and listeners for TCP and UDP ports.
  • Configured security group rules for EKS cluster.
  • +183/-0 
    Configuration changes
    10 files
    archival-node-service.yaml
    Update bootstrap-domain-node service to use NLB.                 

    kubernetes/devnet/base/bootstrap-domain-node/archival-node-service.yaml
  • Added annotations for NLB.
  • Changed service type to LoadBalancer.
  • +4/-1     
    archival-node-configmap.yaml
    Add namespace to bootstrap-node ConfigMap.                             

    kubernetes/devnet/base/bootstrap-node/archival-node-configmap.yaml - Added namespace to ConfigMap.
    +1/-0     
    archival-node-service.yaml
    Update bootstrap-node service to use NLB.                               

    kubernetes/devnet/base/bootstrap-node/archival-node-service.yaml
  • Added annotations for NLB.
  • Changed service type to LoadBalancer.
  • +4/-1     
    archival-node-configmap.yaml
    Add namespace to domain-node ConfigMap.                                   

    kubernetes/devnet/base/domain-node/archival-node-configmap.yaml - Added namespace to ConfigMap.
    +1/-0     
    archival-node.yaml
    Add namespace to domain-node StatefulSet.                               

    kubernetes/devnet/base/domain-node/archival-node.yaml - Added namespace to StatefulSet.
    +1/-0     
    ingress.yaml
    Update namespace for domain-node Ingress.                               

    kubernetes/devnet/base/domain-node/ingress.yaml - Changed namespace to `domain-node`.
    +1/-1     
    pvc.yaml
    Update namespace for domain-node PVC.                                       

    kubernetes/devnet/base/domain-node/pvc.yaml - Changed namespace to `domain-node`.
    +1/-1     
    service.yaml
    Update domain-node service to use NLB.                                     

    kubernetes/devnet/base/domain-node/service.yaml
  • Added annotations for NLB.
  • Changed service type to LoadBalancer.
  • +4/-1     
    service.yaml
    Update farmer service to use NLB.                                               

    kubernetes/devnet/base/farmer/service.yaml
  • Added annotations for NLB.
  • Changed service type to LoadBalancer.
  • +4/-1     
    service.yaml
    Update RPC node service to use NLB.                                           

    kubernetes/devnet/base/rpc-node/service.yaml
  • Added annotations for NLB.
  • Changed service type to LoadBalancer.
  • +4/-1     

    ๐Ÿ’ก PR-Agent usage: Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    github-actions[bot] commented 1 month ago

    PR Review ๐Ÿ”

    โฑ๏ธ Estimated effort to review [1-5] 4, due to the complexity and breadth of the changes across multiple files and configurations, including Terraform and Kubernetes resources. The reviewer needs to understand both AWS and Kubernetes configurations to ensure that the changes are correct and secure.
    ๐Ÿงช Relevant tests No
    โšก Possible issues Security Concern: The security group rules allow traffic from any IP address (0.0.0.0/0 and ::/0) for both TCP and UDP ports. This is a broad range that could expose the services to potential security threats if not intended for public access.
    ๐Ÿ”’ Security concerns - Open Access: The ingress rules for both TCP and UDP are set to allow traffic from any IP address, which could lead to unauthorized access if the services are not meant to be publicly available.
    Code feedback:
    relevant fileeks/eks-blue/nlb.tf
    suggestion       Consider restricting the CIDR blocks in the security group rules to specific IP ranges that require access, instead of allowing all IPs (0.0.0.0/0 and ::/0). This change would enhance the security by limiting access to the services. [important]
    relevant linecidr_blocks = ["0.0.0.0/0"]

    relevant fileeks/eks-blue/nlb.tf
    suggestion       Review the necessity of opening both TCP and UDP ports for the same services. If UDP is not required, consider removing those rules to reduce the attack surface. [important]
    relevant lineprotocol = "udp"

    relevant fileeks/eks-blue/nlb.tf
    suggestion       Add logging for the Network Load Balancer to monitor traffic and troubleshoot issues. This can be done by enabling access logs in the `aws_lb` resource. [medium]
    relevant lineresource "aws_lb" "nlb" {

    relevant fileeks/eks-blue/nlb.tf
    suggestion       Implement tags consistently across all resources for better resource management and cost tracking. Ensure that all resources, including security groups and listeners, have appropriate tags. [medium]
    relevant linetags = {

    github-actions[bot] commented 1 month ago

    PR Code Suggestions โœจ

    CategorySuggestion                                                                                                                                    Score
    Security
    Restrict access to the load balancer by specifying more precise CIDR blocks ___ **Consider using a more specific CIDR range for ingress rules instead of allowing all IP
    addresses (0.0.0.0/0 and ::/0). This change enhances the security by restricting access to
    the load balancer to only necessary IP ranges.** [eks/eks-blue/nlb.tf [99-100]](https://github.com/subspace/infra/pull/317/files#diff-68204e9ef19a846dccc0136cb0d37bc74545ca75ebd99b3bf9eb7e7d312602acR99-R100) ```diff -cidr_blocks = ["0.0.0.0/0"] -ipv6_cidr_blocks = ["::/0"] +cidr_blocks = [""] +ipv6_cidr_blocks = [""] ```
    Suggestion importance[1-10]: 9 Why: This suggestion significantly enhances security by limiting access to the load balancer to specific IP ranges, reducing the risk of unauthorized access.
    9
    Best practice
    Add a 'depends_on' attribute to the load balancer resource to ensure correct provisioning order ___ **It is recommended to add a 'depends_on' attribute to ensure that the EKS cluster resources
    are fully provisioned before the load balancer starts its setup. This can prevent
    potential race conditions during infrastructure provisioning.** [eks/eks-blue/nlb.tf [1]](https://github.com/subspace/infra/pull/317/files#diff-68204e9ef19a846dccc0136cb0d37bc74545ca75ebd99b3bf9eb7e7d312602acR1-R1) ```diff resource "aws_lb" "nlb" { + depends_on = [module.eks_cluster] ```
    Suggestion importance[1-10]: 8 Why: Adding a 'depends_on' attribute is a best practice that ensures the load balancer is provisioned only after the EKS cluster resources are ready, preventing potential race conditions.
    8
    Enhancement
    Adjust health check thresholds to enhance target group stability ___ **For the target groups' health checks, consider increasing the 'healthy_threshold' and
    'unhealthy_threshold' to avoid flapping between healthy and unhealthy states. This
    adjustment can lead to more stable operations.** [eks/eks-blue/nlb.tf [31-32]](https://github.com/subspace/infra/pull/317/files#diff-68204e9ef19a846dccc0136cb0d37bc74545ca75ebd99b3bf9eb7e7d312602acR31-R32) ```diff -healthy_threshold = 2 -unhealthy_threshold = 2 +healthy_threshold = 3 +unhealthy_threshold = 3 ```
    Suggestion importance[1-10]: 7 Why: Increasing the health check thresholds can help avoid flapping between healthy and unhealthy states, leading to more stable operations. This is a useful enhancement but not critical.
    7
    Possible issue
    Increase the health check timeout to reduce the risk of premature timeouts ___ **The 'timeout' setting for TCP health checks might be too low, potentially leading to
    premature timeout errors. Consider increasing the timeout to a higher value to accommodate
    network variability.** [eks/eks-blue/nlb.tf [30]](https://github.com/subspace/infra/pull/317/files#diff-68204e9ef19a846dccc0136cb0d37bc74545ca75ebd99b3bf9eb7e7d312602acR30-R30) ```diff -timeout = 5 +timeout = 10 ```
    Suggestion importance[1-10]: 6 Why: Increasing the timeout value can help accommodate network variability and reduce premature timeout errors. This is a minor but beneficial adjustment.
    6