Open esp1 opened 3 years ago
Hi @esp1,
Thanks for reporting this.
I could be wrong, but I don't think we control the Vault build in homebrew, unless you use our own tap: https://github.com/hashicorp/homebrew-tap
Regardless, from what I understand you would have the same issue with our official binaries, because we don't build them on MacOS, we cross-compile them on Linux. This means they don't include the MacOS resolver bits you need.
The good news is that there's a plan in the works to change this. I don't have a timeline, but the design doc was circulating internally just this week, so I don't think it's that far off.
@esp1 I checked in with our release engineering team, and they're going to be addressing this issue as part of an effort that should be landing sometime around September. This may not be the exact time it lands, but it's definitely coming. :)
Thanks for the reply @hsimon-hashicorp, genuinely. Have been fighting this issue for months, maybe longer. Good to know it's getting worked.
As @esp1 notes, we've seen this in Terraform. It's something to do with the underlying Go stock DNS library. It yells YOLO at the mDNS resolver, figures out what upstream DNS server mDNS would use, and calls it directly. I can't even get /etc/hosts
to work correctly because of how the library behaves. I use PiHole for my upstream DNS, which has an option to basically force an A record into the cache.
That mostly works until the AWS load balancer, whose IP address is now hardcoded into PiHole (or /etc/hosts
), changes without warning. Then you spend half the day blaming everything in sight: AD DNS servers, VPN server config, VPN client config, weird routing issues, probably the cat because they would do something like this to mess with us, until it dawns on you to check the PiHole config to see if there's a "rogue" entry that is overriding what you're expecting to get back from your corporate DNS servers. Because Golang. #smh
@rjhornsby That sounds like a pain, to be sure. I'll keep an eye on this one, and please feel free to come along and bump it as you need.
@hsimon-hashicorp Is there any news from the release engineering team regarding when a fix will land?
Just went through another round of "...why isn't DNS resolution behaving properly? wait ... why does it seem to be vault and consul specifically having issues?" before vaguely remembering this bug. Any progress by chance?
Just went through another round of "...why isn't DNS resolution behaving properly? wait ... why does it seem to be vault and consul specifically having issues?" before vaguely remembering this bug. Any progress by chance?
Yes! I think we neglected to include this in the changelog, but the latest releases should include the fix for this, as per #13728. I'll see about updating CHANGELOG.
Fixed in #13728.
I'm not sure if I'm doing something wrong, but this doesn't seem to be fixed:
$ vault --version
Vault v1.9.3 ('7dbdd57243a0d8d9d9e07cd01eb657369f8e1b8a+CHANGES')
$ vault status
Error checking seal status: Get "https://vault.mycorp.com:8200/v1/sys/seal-status": dial tcp: lookup vault.mycorp.com on 192.168.3.7:53: no such host
192.168.3.7
is the local (non VPN) DNS server.
However, immediately following up, curl
behaves properly:
$ curl https://vault.mycorp.com:8200
<a href="/ui/">Temporary Redirect</a>.
For the cURL to get a redirect like that, it has to use the VPN dns and go through the VPN tunnel because vault is not on the interwebs. This suggests that the problem is in the vault binary itself, not with the network or the VPN configuration.
I've tried both vault 1.9.3 (homebrew) and compiling my own locally using export CGO_ENABLED=1; XC_OS="darwin" XC_ARCH="amd64" make dev
[1]. I'm getting the same result for both vault binaries.
[1] go v1.17.6 produces Vault v1.10.0-dev ('057c67f969805a51e944898163aeff069d6a2e37') (cgo)
Note that we don't control the default homebrew vault, though we do have a homebrew tap: https://github.com/hashicorp/homebrew-tap
Compiling your own locally the way we do would use CGO_ENABLED=0
and add the build tag netcgo
, as per https://github.com/hashicorp/vault/blob/main/.github/workflows/build.yml#L206-L206. Not sure offhand how to do it using make
, since the build
target is more intended for our release automation, but you could try doing what that code is doing.
Or you could go to https://releases.hashicorp.com/vault/1.9.3/.
thanks for the feedback. that helps me understand what's going on.
I tried using both the tap ==> Downloading https://releases.hashicorp.com/vault/1.9.3/vault_1.9.3_darwin_amd64.zip
and - to be sure - grabbing the same binary(?) directly from https://releases.hashicorp.com/vault/1.9.3/, but got the same failed DNS results for both.
However, looking at the build fragment you linked, I was able to compile 1.10 from master like so:
$ GO_TAGS=netcgo make dev
and that ... worked. Name resolution does what it is expected. This also works for the v1.9.3
tag.
It seems that while reading through all the different threads CGO_ENABLED
should have worked, it's the netcgo
tag that did it? It's also not clear what might be different about how I built it locally vs the HashiCorp official binary.
One of the things I did notice is that on mine I get (cgo)
at the end of the version string, whereas with the HashiCorp version I don't
$ ~/tmp/bin/vault --version
Vault v1.9.3 ('7dbdd57243a0d8d9d9e07cd01eb657369f8e1b8a') (cgo)
$ ~/Downloads/vault --version
Vault v1.9.3 (7dbdd57243a0d8d9d9e07cd01eb657369f8e1b8a)
We don't want to set CGO_ENABLED=1 as that has a bunch of consequences. The fact that you have cgo
in your version string makes me wonder if maybe you had the env var populated when you ran make dev
?
Re-opening since it sounds like our fix didn't work.
The fact that you have
cgo
in your version string makes me wonder if maybe you had the env var populated when you ranmake dev
?
I went back and checked, and I think you're right about my having CGO_ENABLED
set in the environment. I recompiled 1.9.3 (7dbdd5724
) intentionally making sure CGO_ENABLED
was not set and the resulting binary failed DNS. I compiled again with both CGO_ENABLED
and the netcgo
tag - which succeeded DNS lookups.
I was under the impression that it was possible to get proper DNS lookups on darwin using CGO_ENABLED=0 and -tags netcgo
, but that's looking to be untrue. In hindsight it seems obvious that "netcgo" requires CGO. We'll try to sort this out for the next release, sorry!
@ncabatoff I left a comment on your PR, CGO_ENABLED=1
needs to be set or it will not work.
Adding a link to this comment regarding cross-compilation for ARM64 from AMD64 on macOS CI: https://github.com/golang/go/issues/12524#issuecomment-1006174901
It appears that this is having issues again.
The binaries distributed by Hashicorp and the ones installed by Homebrew all have this issue on the latest versions (v1.12.0). When building myself from source (not cross-compiling), the issue persists:
$ make
...
$ bin/vault status
Error checking seal status: Get "https://vault.service.consul:8200/v1/sys/seal-status": dial tcp: lookup vault.service.consul on [2001:558:feed::1]:53: no such host
$ otool -L bin/vault
bin/vault:
/usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 0.0.0, current version 0.0.0)
When following the workaround mentioned above (this one), the issue is resolved:
$ CGO_ENABLED=1 GOARCH=arm64 make
...
$ bin/vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 1
Threshold 1
Version 1.11.2
Build Date 2022-07-29T09:48:47Z
Storage Type raft
Cluster Name vault-cluster-7d0a318b
Cluster ID 432e615e-9ca5-522a-e48d-7dc069f1a1bd
HA Enabled true
HA Cluster https://172.30.0.1:8201/
HA Mode active
Active Since 2022-08-30T07:50:53.025121063Z
Raft Committed Index 11074
Raft Applied Index 11074
$ otool -L bin/vault
bin/vault:
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1858.112.0)
/System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60158.100.133)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
Note that a quick way to see if the issue will manifest, at least on my system when not cross-compiling, is to check for the presence of IOKit.framework
in the listing of otool -L
. I believe that this library is only coincidentally included and doesn't have to do with the issue at hand, but it may be useful as a smoke test for verifying that the version of Vault is going to work with the system's DNS configuration.
uname -mpv
- Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 arm64 armSame here with
Installed via brew Vault v1.12.1 ('e34f8a14fb7a88af4640b09f3ddbb5646b946d9c+CHANGES'), built 2022-10-27T12:32:05Z
on an Mac Mini m1 ( macOS 13.0.1 )
The version installed from home-brew is once again failing to resolve correctly:
vault --version
Vault v1.12.2 ('415e1fe3118eebd5df6cb60d13defdc01aa17b03+CHANGES'), built 2022-11-23T12:53:46Z
@archoversight, confirmed. Also made sure I got vault from the hashicorp tap.
$ brew install hashicorp/tap/vault
==> Installing vault from hashicorp/tap
...
$ vault status # bypasses local mDNS resolver config
Error checking seal status: Get "https://vault.mycorpcom:8200/v1/sys/seal-status": dial tcp: lookup vault.mycorp.com on 192.168.3.7:53: no such host
...
$ curl https://vault.mycorp.com # uses domain-appropriate DNS servers
<html>
<head><title>301 Moved Permanently</title></head>
For now, I get vault working by maintaining a static DNS entry in the local DNS server (192.168.3.7) but that's brittle obviously.
Look like the same issue
➜ ~ vault version
Vault v1.12.2 ('415e1fe3118eebd5df6cb60d13defdc01aa17b03+CHANGES'), built 2022-11-23T12:53:46Z
➜ ~ cat .zshrc
...
export VAULT_ADDR=http://vault.service.consul:8200
export NOMAD_ADDR=http://nomad.service.consul:4646
export CONSUL_HTTP_ADDR=http://consul.service.consul:8500
➜ ~ dig @10.27.96.4 -p 8600 vault.service.consul. ANY
...
vault.service.consul. 0 IN A 10.27.96.3
➜ ~ vault status
Error checking seal status: Get "http://vault.service.consul:8200/v1/sys/seal-status": dial tcp: lookup vault.service.consul on 8.8.8.8:53: no such host
But Consul and Nomad work with the same setup
➜ ~ consul members
Node Address Status Type Build Protocol DC Partition Segment
consul-0 10.27.96.4:8301 alive server 1.14.3 2 dc1 default <all>
nomad-0 10.27.96.6:8301 alive client 1.14.3 2 dc1 default <default>
nomad-client-0 10.27.96.5:8301 alive client 1.14.3 2 dc1 default <default>
vault-0 10.27.96.3:8301 alive client 1.14.3 2 dc1 default <default>
➜ ~ nomad job status
No running jobs
This is still happening with 1.12.3
❯ vault status
Error checking seal status: Get "https://vault.mydomain.com/v1/sys/seal-status": dial tcp: lookup vault.mydomain.com on 8.8.8.8:53: no such host
❯
❯
❯ vault version
Vault v1.12.3 ('209b3dd99fe8ca320340d08c70cff5f620261f9b+CHANGES'), built 2023-02-02T09:07:27Z
Vault 1.13 is going to use Go 1.20, which should allow for an easy fix. I suspect it won't be fixed in 1.13.0 (though maybe?), but I aim to address it by 1.13.1 at least.
Going by
Note that a quick way to see if the issue will manifest, at least on my system when not cross-compiling, is to check for the presence of IOKit.framework in the listing of otool -L. I believe that this library is only coincidentally included and doesn't have to do with the issue at hand, but it may be useful as a smoke test for verifying that the version of Vault is going to work with the system's DNS configuration.
I've checked and it seems that we're good now:
$ which vault
/Users/ncc/go/bin/vault
$ vault version
Vault v1.15.1 (b94e275f25ccd9011146d14c00ea9e49fd5032dc), built 2023-10-20T19:16:11Z
$ otool -L ~/go/bin/vault
/Users/ncc/go/bin/vault:
/usr/lib/libSystem.B.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libresolv.9.dylib (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 0.0.0, current version 0.0.0)
Would one of the impacted people above care to verify this in their environments?
Describe the bug The
vault
command line tool does not resolve VPN hosts when connected to OpenVPN. Note that OpenVPN is configured with a split DNS setup and does not modify/etc/resolv.conf
to add in nameservers for VPN hosts.To Reproduce Steps to reproduce the behavior:
env VAULT_ADDR=https://internal.vpn.host vault login -method=ldap username=edwin
Error authenticating: Put "https://internal.vpn.host/v1/auth/ldap/login/edwin": dial tcp: lookup vault.prod.factual.com on 192.168.1.1:53: no such host
Expected behavior Should see message
Success! You are now authenticated.
Environment:
vault status
): 1.4.2vault version
): v1.7.3 ('5d517c864c8f10385bf65627891bc7ef55f5e827+CHANGES')Additional context I am experiencing this issue on the above version of
vault
CLI installed via homebrew on a Mac.I believe this is due to the same issue as https://github.com/hashicorp/terraform/issues/3536 that was fixed in https://github.com/hashicorp/terraform/pull/5925. The problem and solution are summarized in https://github.com/hashicorp/terraform/issues/3536#issuecomment-203274147:
I am able to work around this issue by manually editing
/etc/resolv.conf
to use the VPN nameservers, or by putting the IP address of the vault server into/etc/hosts
.