hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

empty alternative domain (-alt-domain="") is not possible anymore with 1.7.1 #7346

Open sielaq opened 4 years ago

sielaq commented 4 years ago

Hello

Overview of the Issue

In Consul 1.6.3 we were able to run consul agent with empty alternative domain
either with -alt-domain=""
or with /etc/consul/config.json that contains

{
  "alt_domain": "",
...
}

It was very handy for developers environments
since they didn't have to remember about current FQDN they work. like

$ dig @localhost -p8600 foo001.node +short
10.4.4.4

so having

server=/node/127.0.0.1#8600`
server=/service/127.0.0.1#8600

in DNSmasq config was very useful so hosts and services were resolved
alternatively for humans like:

dig foo001.node
10.4.4.4
dig foo.service
10.4.4.5

with 1.7.1 this feature is not possible anymore

Reproduction Steps

  1. Create a cluster with 1 client node and 1 server node - both with -alt-domain=""
  2. ask about self hostname like dig @localhost -p8600 foo001.node +short

in 1.6.3 works, in 1.7.1 it doesn't work anymore

Consul info for both Client and Server

Client info ``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 1 build: prerelease = revision = 2cf0a3c8 version = 1.7.1 consul: acl = disabled known_servers = 1 server = false runtime: arch = amd64 cpu_count = 1 goroutines = 48 max_procs = 1 os = linux version = go1.13.7 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 7 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 164 members = 43 query_queue = 0 query_time = 1 ```

Operating system and Environment details

Ubuntu x86_64

Log Fragments

==> Starting Consul agent...
           Version: 'v1.7.1'
           Node ID: '851272e1-391c-133c-0499-59d88815c382'
         Node name: 'foo001'
        Datacenter: 'staging' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.4.4.4 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
sielaq commented 4 years ago

probably related to the change https://github.com/hashicorp/consul/pull/7323

mkeeler commented 4 years ago

This change was intentional. Consul was never meant to handle the root zone and only was accidentally doing so due to a bug introduced in v1.5.2.

I would recommend that the resolvers on the nodes making those requests be configured with a default search domain.

Going forward it could be possible to add a config switch to make handling the root zone intentional but this was fixed as it causes problems in other scenarios.

sielaq commented 4 years ago

I tend to disagree - it is a great feature that helps in plenty staging environments, where developers doesn't have to care much about current FQDN of the staging they use so any service / node can be easy accessible from command line and proxies. Moreover: it simplify of properties handling - without FDQNs - for multiple staging env, so you have it unified. Developing made a new level

be configured with a default search domain.

unfortunately only partially true - not every app respect search domains especially proxies (like squid etc.) alternative domain (no-domain) helps to keep it consistence on every level for every application. We find it very useful for developing.

Can we find a different solution like -alt-domain-empty, so other scenarios are satisfied ?

mkeeler commented 4 years ago

Right. I think the best solution is to add new configuration so operators can choose the desired behavior. With this should come documentation describing the trade offs of allowing consul to process short queries.

mkeeler commented 4 years ago

Also for extra context, consul already had the same behavior when any recursors were configured.