hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.27k stars 4.42k forks source link

Consul on macOS Sequoia is unable to connect to server #21765

Open vsarunas opened 1 week ago

vsarunas commented 1 week ago

Overview of the Issue

Experiencing that Consul on macOS Sequoia is actually not working completely since it is not able to establish a connection:

2024-09-17T09:04:48.507+0100 [WARN]  agent: (LAN) couldn't join: number_of_nodes=0
  error=
  | 1 error occurred:
  | \t* Failed to join 192.168.67.2:8301: dial tcp 192.168.67.2:8301: connect: no route to host

This only happens when the Consul is launched via launchctl; but does not occur when launched via Terminal.

The problem looks to be related to the way Go linker works and the fact macOS Local Network Privacy is doing some filtering; and previous with macOS Sonoma it allowed the connections but Sequoia does not. This is also been reported under https://github.com/golang/go/issues/68678

The root cause is that Consul binary does not have the UUID set at link time, as there is for Nomad which causes the Local Network Privacy to drop the connection attempts:

$ dwarfdump --uuid /opt/homebrew/bin/consul
$ dwarfdump --uuid /opt/homebrew/bin/nomad 
UUID: 7A0A50B8-59DC-309C-A1BE-51B591F285AA (arm64) /opt/homebrew/bin/nomad

Are you able to adjust the linker settings in order to make Consul work on Sequoia?

Reproduction Steps

Try to join Consul server on the local network when using launchctl.

Minimal example for showing the Go linking problem is also provided under https://github.com/golang/go/issues/68678

Consul info for both Client and Server

Consul v.1.19.2

Operating system and Environment details

macOS 15.0

vsarunas commented 1 week ago

The difference between Nomad and Consul Makefile that cause the LC_UUID to be set is setting of CGO_ENABLED=1

CGO_ENABLED=1 go build -ldflags "-X github.com/hashicorp/consul/version.GitCommit=ac9e694b98+CHANGES -X github.com/hashicorp/consul/version.BuildDate=2024-09-18T19:36:27Z " -tags ""

dwarfdump --uuid consul
UUID: 03455CBE-9535-F7F5-C277-FFB6D21A71CE (arm64) consul

With the default:

CGO_ENABLED=0 go build -ldflags "-X github.com/hashicorp/consul/version.GitCommit=ac9e694b98+CHANGES -X github.com/hashicorp/consul/version.BuildDate=2024-09-18T19:36:27Z " -tags ""

dwarfdump --uuid consul
<empty>

Trying to build with external linker requires CGO to be enabled:

CGO_ENABLED=0 go build -ldflags "-linkmode=external -X github.com/hashicorp/consul/version.GitCommit=ac9e694b98+CHANGES -X github.com/hashicorp/consul/version.BuildDate=2024-09-18T19:36:27Z " -tags ""
-linkmode=external requires external (cgo) linking, but cgo is not enabled

Building with current go defaults results in a working binary:

go build -ldflags "-X github.com/hashicorp/consul/version.GitCommit=ac9e694b98+CHANGES -X github.com/hashicorp/consul/version.BuildDate=2024-09-18T19:36:27Z " -tags ""
dwarfdump --uuid consul                                                                                                                                                
UUID: 03455CBE-9535-F7F5-C277-FFB6D21A71CE (arm64) consul

In Nomad this was introduced initially in https://github.com/hashicorp/nomad/pull/11329.

In Consul this first appears 8 years ago under https://github.com/hashicorp/consul/commit/33829cd and survives the several iterations of build script updates and move to Makefiles.

The release build platforms also set this extensively propagates this flag - https://github.com/hashicorp/consul/blob/main/.github/workflows/build.yml#L135