hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

SystemD configuration broken with single-node / dev setups #18097

Open aft2d opened 1 year ago

aft2d commented 1 year ago

Overview of the Issue

This issue has already been observed and mentioned in https://github.com/hashicorp/consul/issues/16844. Since it's kind of separate, I think it's useful to outline and track this in a new issue.

Since https://github.com/hashicorp/consul/pull/16845 got merged single-node/dev setups of Consul don't work with systemd out of the box any more. When running systemctl start consul systemd is just stuck and times out. Consul appears to start fine, but just systemd does not know that.

Online docs mention that the NOTIFY_SOCKET is only notified when a LAN join is completed and join or retry_join is set. Single node setups don't have that usually. Maybe it is related to that.

Workaround

systemctl edit consul

[Service]
Type=simple

Reproduction Steps

Consul info for both Client and Server

Consul info Output from 'consul info' command here ``` agent: check_monitors = 0 check_ttls = 0 checks = 0 services = 0 build: prerelease = revision = 192df66a version = 1.16.0 version_metadata = consul: acl = disabled bootstrap = true known_datacenters = 1 leader = true leader_addr = 10.0.2.15:8300 server = true raft: applied_index = 18 commit_index = 18 fsm_pending = 0 last_contact = 0 last_log_index = 18 last_log_term = 2 last_snapshot_index = 0 last_snapshot_term = 0 latest_configuration = [{Suffrage:Voter ID:525e7b31-91a1-1907-5715-d94bc31c73d2 Address:10.0.2.15:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Leader term = 2 runtime: arch = amd64 cpu_count = 2 goroutines = 167 max_procs = 2 os = linux version = go1.20.4 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 1 event_time = 2 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 2 members = 1 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1 members = 1 query_queue = 0 query_time = 1 ``` HCL config ``` data_dir = "/opt/consul" server=true bootstrap_expect=1 ```

Operating system and Environment details

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"

Log Fragments

Consul log

``` Jul 12 12:06:26 ubuntu-jammy systemd[1]: Starting "HashiCorp Consul - A service mesh solution"... Jul 12 12:06:26 ubuntu-jammy consul[2739]: ==> Starting Consul agent... Jul 12 12:06:26 ubuntu-jammy consul[2739]: Version: '1.16.0' Jul 12 12:06:26 ubuntu-jammy consul[2739]: Build Date: '2023-06-26 20:07:11 +0000 UTC' Jul 12 12:06:26 ubuntu-jammy consul[2739]: Node ID: '525e7b31-91a1-1907-5715-d94bc31c73d2' Jul 12 12:06:26 ubuntu-jammy consul[2739]: Node name: 'ubuntu-jammy' Jul 12 12:06:26 ubuntu-jammy consul[2739]: Datacenter: 'dc1' (Segment: '') Jul 12 12:06:26 ubuntu-jammy consul[2739]: Server: true (Bootstrap: true) Jul 12 12:06:26 ubuntu-jammy consul[2739]: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: 8503, DNS: 8600) Jul 12 12:06:26 ubuntu-jammy consul[2739]: Cluster Addr: 10.0.2.15 (LAN: 8301, WAN: 8302) Jul 12 12:06:26 ubuntu-jammy consul[2739]: Gossip Encryption: false Jul 12 12:06:26 ubuntu-jammy consul[2739]: Auto-Encrypt-TLS: false Jul 12 12:06:26 ubuntu-jammy consul[2739]: ACL Enabled: false Jul 12 12:06:26 ubuntu-jammy consul[2739]: Reporting Enabled: false Jul 12 12:06:26 ubuntu-jammy consul[2739]: ACL Default Policy: allow Jul 12 12:06:26 ubuntu-jammy consul[2739]: HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2 Jul 12 12:06:26 ubuntu-jammy consul[2739]: gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2 Jul 12 12:06:26 ubuntu-jammy consul[2739]: Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2 Jul 12 12:06:26 ubuntu-jammy consul[2739]: ==> Log data will now stream in as it occurs: Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.939Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.939Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode. Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.939Z [WARN] agent: bootstrap = true: do not enable unless necessary Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.942Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.942Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode. Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.942Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.943Z [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:525e7b31-91a1-1907-5715-d94bc31c73d2 Address:10.0.2.15:> Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [INFO] agent.server.raft: entering follower state: follower="Node at 10.0.2.15:8300 [Follower]" leader-address= leader-id= Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: ubuntu-jammy.dc1 10.0.2.15 Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: ubuntu-jammy 10.0.2.15 Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [INFO] agent.router: Initializing LAN area manager Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [WARN] agent.server.serf.wan: serf: Failed to re-join any previously known node Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.944Z [WARN] agent.server.serf.lan: serf: Failed to re-join any previously known node Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.945Z [INFO] agent.server: Adding LAN server: server="ubuntu-jammy (Addr: tcp/10.0.2.15:8300) (DC: dc1)" Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.945Z [INFO] agent.server.autopilot: reconciliation now disabled Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.945Z [INFO] agent.server: Handled event for server in area: event=member-join server=ubuntu-jammy.dc1 area=wan Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.945Z [INFO] agent.server.cert-manager: initialized server certificate management Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.945Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.946Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.946Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.946Z [INFO] agent: started state syncer Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.946Z [INFO] agent: Consul agent running! Jul 12 12:06:26 ubuntu-jammy consul[2739]: 2023-07-12T12:06:26.946Z [INFO] agent: Started gRPC listeners: port_name=grpc_tls address=127.0.0.1:8503 network=tcp ```

SystemD Log

``` Jul 12 12:06:26 ubuntu-jammy systemd[1]: Starting "HashiCorp Consul - A service mesh solution"... Jul 12 12:07:57 ubuntu-jammy systemd[1]: consul.service: start operation timed out. Terminating. Jul 12 12:07:57 ubuntu-jammy systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE Jul 12 12:07:57 ubuntu-jammy systemd[1]: consul.service: Failed with result 'timeout'. Jul 12 12:07:57 ubuntu-jammy systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution". ```
jbussdieker commented 11 months ago

I ended up adding this to the end of the consul.hcl even though it wasn't needed to bootstrap my single node cluster:

retry_join = ["127.0.0.1"]