hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.44k stars 4.43k forks source link

Terminating Gateway only populates service instance from the first external node when Node address is a DNS name #16040

Open Ranjandas opened 1 year ago

Ranjandas commented 1 year ago

Overview of the Issue

When multiple instances of external services are registered with node.address as the DNS name, only the service instance from the first node gets populated in the terminating gateway clusters.

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Run a consul dev cluster

    $ consul agent -dev
  2. Create a terminating gateway config entry

    # file: tg.hcl
    Kind = "terminating-gateway"
    Name = "legacy-services-gateway"
    Services = [
     {
       Name = "counting"
     }
    ]
    $ consul config write tg.hcl
  3. Start the terminating gateway

    $ consul connect envoy -gateway=terminating -service legacy-services-gateway -register
  4. Register two external services. We will use the node names as b and c initially to show that only instances from node b are registered.

    i. register service with node name b

    # file: b.external.json
    
    {
      "Node": "b",
      "Address": "google.com",
      "NodeMeta": {
        "external-node": "true",
        "external-probe": "true"
      },
      "Service": {
        "ID": "counting1",
        "Service": "counting",
        "Port": 9003
      }
    }
    $ curl -X PUT --data @b.external.json localhost:8500/v1/catalog/register
    true
    $ curl -s localhost:19000/clusters | grep hostname
    counting.default.dc1.internal.4b90cae1-8761-7e60-1dce-fabbe4082006.consul::142.250.70.142:9003::hostname::google.com
    local_agent::127.0.0.1:8503::hostname::

    ii. register service with node name c

    # file: c.external.json
    
    {
      "Node": "c",
      "Address": "wikipedia.org",
      "NodeMeta": {
        "external-node": "true",
        "external-probe": "true"
      },
      "Service": {
        "ID": "counting2",
        "Service": "counting",
        "Port": 9003
      }
    }
    $ curl -X PUT --data @c.external.json localhost:8500/v1/catalog/register
    true
    $ curl -s localhost:19000/clusters | grep hostname
    counting.default.dc1.internal.4b90cae1-8761-7e60-1dce-fabbe4082006.consul::142.250.70.142:9003::hostname::google.com
    local_agent::127.0.0.1:8503::hostname::

    You will find that the service registered with node name c doesn't appear in the clusters list.

  5. Register a new service with node name a

    # file: a.external.json
    
    {
      "Node": "a",
      "Address": "hashicorp.com",
      "NodeMeta": {
        "external-node": "true",
        "external-probe": "true"
      },
      "Service": {
        "ID": "counting3",
        "Service": "counting",
        "Port": 9003
      }
    }
    $ curl -X PUT --data @a.external.json localhost:8500/v1/catalog/register
    true
    $ curl -s localhost:19000/clusters | grep hostname
    local_agent::127.0.0.1:8503::hostname::
    counting.default.dc1.internal.5e540109-8fa5-2940-b277-3a717568c2e5.consul::76.76.21.21:9003::hostname::hashicorp.com

    Notice how the cluster changed to hashicorp.com with node name a.

This works perfectly when the external nodes are registered with IP Addresses.

Example clusters when using IP Addresses for node.Address

$ curl -s localhost:19000/clusters | grep hostname
local_agent::127.0.0.1:8503::hostname::
counting.default.dc1.internal.8da69c37-1cbc-ddef-2819-7f22d0163d9f.consul::1.1.1.1:9003::hostname::
counting.default.dc1.internal.8da69c37-1cbc-ddef-2819-7f22d0163d9f.consul::2.2.2.2:9003::hostname::
counting.default.dc1.internal.8da69c37-1cbc-ddef-2819-7f22d0163d9f.consul::3.3.3.3:9003::hostname::

Consul info for both Client and Server

Client info ``` agent: check_monitors = 0 check_ttls = 0 checks = 1 services = 1 build: prerelease = revision = bd257019 version = 1.14.3 version_metadata = consul: acl = disabled bootstrap = false known_datacenters = 1 leader = true leader_addr = 127.0.0.1:8300 server = true raft: applied_index = 277 commit_index = 277 fsm_pending = 0 last_contact = 0 last_log_index = 277 last_log_term = 2 last_snapshot_index = 0 last_snapshot_term = 0 latest_configuration = [{Suffrage:Voter ID:e7c2fe1e-0c1d-e1ad-169d-b1a46432eec2 Address:127.0.0.1:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 2 runtime: arch = amd64 cpu_count = 8 goroutines = 157 max_procs = 8 os = darwin version = go1.19.4 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 1 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 ```
Server info ``` agent: check_monitors = 0 check_ttls = 0 checks = 1 services = 1 build: prerelease = revision = bd257019 version = 1.14.3 version_metadata = consul: acl = disabled bootstrap = false known_datacenters = 1 leader = true leader_addr = 127.0.0.1:8300 server = true raft: applied_index = 277 commit_index = 277 fsm_pending = 0 last_contact = 0 last_log_index = 277 last_log_term = 2 last_snapshot_index = 0 last_snapshot_term = 0 latest_configuration = [{Suffrage:Voter ID:e7c2fe1e-0c1d-e1ad-169d-b1a46432eec2 Address:127.0.0.1:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 2 runtime: arch = amd64 cpu_count = 8 goroutines = 157 max_procs = 8 os = darwin version = go1.19.4 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 1 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 ```

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

Log Fragments

Include appropriate Client or Server log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log instead of posting it in the issue. Use -log-level=TRACE on the client and server to capture the maximum log detail.

blake commented 7 months ago

@Ranjandas This is not a valid configuration. Per RFC 1034, section 3.6.2, "If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different."

When you register the external service to a node whose address is a DNS hostname, that effectively creates a CNAME record that points counting.service.consul to the specified hostname, e.g., google.com. Since no other records can exist alongside a CNAME RR – including other CNAMES – the other records that point to different aliases are correctly ignored.

The solution is either to specify IP addresses in the Node or Service address instead of a hostname, or only have a single service instance that points to a single DNS hostname.