kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services
Apache License 2.0
7.58k stars 2.54k forks source link

Support Multiple RFC2136 Hosts with Load Balancing Options #4651

Open Jeremy-Boyle opened 1 month ago

Jeremy-Boyle commented 1 month ago

Description: Enhance the RFC2136 provider to support multiple hosts and introduce load balancing options to distribute DNS update requests evenly across available DNS servers.

Background: Currently, the RFC2136 provider in ExternalDNS only supports a single host for DNS updates. In environments with multiple DNS servers, this limitation can lead to a single server becoming a bottleneck, especially when there are multiple ExternalDNS instances. We are experiencing an issue where all ExternalDNS pods are hitting a single data center (DC), causing it to become overloaded.

Proposed Solution:

  1. Allow --rfc2136-host to Accept Multiple Hosts:

    • Modify the --rfc2136-host command-line option to accept multiple like the zone works.
    • Example: --rfc2136-host="host1.example.com" --rfc2136-host="host2.example.com"
  2. Introduce Load Balancing Options:

    • Add a new command-line option --rfc2136-load-balancing-strategy to specify the load balancing strategy.
    • Supported options:
      • round-robin: Distribute DNS updates evenly across all specified hosts in a round-robin manner.
      • random: Randomly select a host for each DNS update.
      • disabled (default): Use the first host in the list as the primary, only moving to the next host if a failure occurs. Implement retry options. Only available if more than one host is provided, otherwise the value is ignored.
  3. Retry Mechanism:

    • Implement a retry mechanism that moves to the next host in the list if the current host fails.
    • Allow configuration of the number of retries before moving to the next host.

Example Configuration:

external-dns \
  --provider=rfc2136 \
  --rfc2136-host="host1.example.com" \
  --rfc2136-host="host2.example.com" \
  --rfc2136-host="host3.example.com" \
  --rfc2136-load-balancing="round-robin" \
  --rfc2136-port=53 \
  --rfc2136-zone=example.com \
  --rfc2136-tsig-secret-alg=hmac-sha256 \
  --rfc2136-tsig-keyname=example-key \
  --rfc2136-tsig-secret=example-secret \
  --rfc2136-insecure

Benefits:

Tasks:

  1. Update the RFC2136 provider to parse multiple hosts from the --rfc2136-host option. Similar to to --rfc2136-zones works
  2. Implement the --rfc2136-load-balancing option with the specified strategies.
  3. Implement the retry mechanism to handle failures and move to the next host.
  4. Update documentation to reflect the new options and provide examples.
  5. Write unit tests and integration tests to ensure the new functionality works as expected.

Workarounds:

  1. Manually Distribute Zones Across ExternalDNS Instances:

    • Description: Manually configure each ExternalDNS instance to use a different DNS server to distribute the load.
    • Steps:
      1. Deploy multiple instances of ExternalDNS.
      2. Assign each instance a different RFC2136 DNS server.
      3. Manually configure each instance to manage a specific subset of DNS zones.
    • Example Configuration:
      • Instance 1:
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="host1.example.com" \
        --rfc2136-zone=example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=example-key \
        --rfc2136-tsig-secret=example-secret \
        --rfc2136-insecure
      • Instance 2:
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="host2.example.com" \
        --rfc2136-zone=another-example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=another-key \
        --rfc2136-tsig-secret=another-secret \
        --rfc2136-insecure
  2. Use Different Subdomains for Load Distribution:

    • Description: Segment the DNS zones into different subdomains and assign each subdomain to a different ExternalDNS instance.
    • Steps:
      1. Create subdomains to segment the DNS zones.
      2. Deploy multiple ExternalDNS instances, each responsible for a specific subdomain.
    • Example Configuration:
      • Instance 1 (subdomain1.example.com):
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="host1.example.com" \
        --rfc2136-zone=subdomain1.example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=subdomain1-key \
        --rfc2136-tsig-secret=subdomain1-secret \
        --rfc2136-insecure
      • Instance 2 (subdomain2.example.com):
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="host2.example.com" \
        --rfc2136-zone=subdomain2.example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=subdomain2-key \
        --rfc2136-tsig-secret=subdomain2-secret \
        --rfc2136-insecure
  3. Manual Failover Management:

    • Description: Manually monitor and switch the ExternalDNS instances to backup DNS servers in case of failure.
    • Steps:
      1. Regularly monitor the health and performance of the DNS servers.
      2. Manually update the ExternalDNS configuration to switch to a backup DNS server if the primary server fails.
    • Example Configuration for Failover:
      • Primary Configuration:
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="primary-host.example.com" \
        --rfc2136-zone=example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=primary-key \
        --rfc2136-tsig-secret=primary-secret \
        --rfc2136-insecure
      • Failover Configuration (manual update required):
        external-dns \
        --provider=rfc2136 \
        --rfc2136-host="backup-host.example.com" \
        --rfc2136-zone=example.com \
        --rfc2136-tsig-secret-alg=hmac-sha256 \
        --rfc2136-tsig-keyname=backup-key \
        --rfc2136-tsig-secret=backup-secret \
        --rfc2136-insecure

Additional Notes:

These workarounds can help mitigate the immediate issue of overloading a single DNS server by distributing the load manually. However, they are not as efficient or resilient as an automated load balancing and failover solution. Implementing the proposed enhancements would significantly improve the system's reliability and performance by automating these processes.

Fixes: https://github.com/kubernetes-sigs/external-dns/issues/3470

Jeremy-Boyle commented 1 month ago

I can support the work for this in a PR.

This issue, is just to talk design strategy.