Closed christian-heusel closed 2 weeks ago
I think you forgot to add /udp
in the docker compose.
This new beta update works for me without changing the setup.
ports:
- 3478:3478/udp
Adding in the /udp
did indeed solve the issue, but why did this work with the pre-beta versions? π€
Also should this maybe be in the upgrade documentation for the final release?
Ah nevermind it just took tailscale status
a moment to realize that the DERP is gone, changing the network config does not help for me π
I'm having trouble reproducing this and all of the tests keep passing, it has me quite puzzled.
The error about empty DERP is only covering the DERP loaded via URL/file, so in this case it is displayed before the DERPs from the embedded server, and if there are no DERPs at all, the whole server will halt https://github.com/juanfont/headscale/blob/main/hscontrol/app.go#L516-L518.
Ah nevermind it just took tailscale status a moment to realize that the DERP is gone, changing the network config does not help for me π
Does this mean it was there initially, but then disappeared after?
I've expanded the DERP tests a bit to ensure that the embedded server isnt removed by the updater in #2030.
# Health check:
# - Tailscale could not connect to the 'Headscale Embedded DERP' relay server. Your Internet connection might be down, or the server might be temporarily unavailable.
# - Tailscale could not connect to any relay server. Check your Internet connection.
So this makes me think that this is a networking issue, because headscale sends the DERP server as part of the map update. I cant really think of anything that would have changed this in the commits between the last alpha and the beta. Could there be an external event/change to your docker setup π€ (odd since reverting works).
I did notice this tho:
headscale | 2024-07-23T00:58:26Z INF STUN server started at [::]:3478
This could indicate that it only listens to IPv6? however my test logs shows the same, so I would find it odd to be the cause, and I do not think anything related to that has changed.
Does this mean it was there initially, but then disappeared after?
No the way I'm testing this is that I'm redeploying the other version on my VPS and then run tailscale status on my client to see if it's still working / printing out the error.
So this makes me think that this is a networking issue, because headscale sends the DERP server as part of the map update. I cant really think of anything that would have changed this in the commits between the last alpha and the beta. Could there be an external event/change to your docker setup π€ (odd since reverting works).
This was my first thought aswell, but the issue now reproduces over multirple docker versions and really consistently with every switch of images that I do.
This could indicate that it only listens to IPv6? however my test logs shows the same, so I would find it odd to be the cause, and I do not think anything related to that has changed.
After I have switched to the -debug
version of the image I was able to check this inside of the container, and the outputs were the same for both versions:
/ # netstat -lntu | grep 3478
udp 0 0 :::3478 :::*
$ ss -tulpn | grep 3478
udp UNCONN 0 0 0.0.0.0:3478 0.0.0.0:* users:(("docker-proxy",pid=5895,fd=4))
udp UNCONN 0 0 [::]:3478 [::]:* users:(("docker-proxy",pid=5901,fd=4))
So since all of this did not help I also had a look at the output of tailscaled on my client and this looks interesting:
Jul 25 12:30:25 meterpeter tailscaled[131191]: derphttp.Client.Recv: connecting to derp-999 (christian-derp)
Jul 25 12:30:25 meterpeter tailscaled[131191]: magicsock: [0xc0035fd540] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (christian-derp): dial tcp4: lookup vpn.heusel.eu: no such host
Jul 25 12:30:25 meterpeter tailscaled[131191]: netcheck: netcheck.runProbe: named node "999" has no v6 address
Jul 25 12:30:25 meterpeter tailscaled[131191]: netcheck: netcheck: DNS lookup error for "vpn.heusel.eu" (node "999" region 999): context canceled
Jul 25 12:30:25 meterpeter tailscaled[131191]: netcheck: netcheck.runProbe: named node "999" has no v4 address
Jul 25 12:30:27 meterpeter tailscaled[131191]: control: NetInfo: NetInfo{varies= hairpin= ipv6=false ipv6os=true udp=true icmpv4=false derp=#999 portmap=UC link="" firewallmode="ipt-default"}
So what actually seems to break is the internal DNS server (or something in that realm) and the DERP is just fallout from the before failure:
# alpha12
$ resolvectl status tailscale0
Link 9 (tailscale0)
Current Scopes: DNS
Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 100.100.100.100
DNS Servers: 100.100.100.100
DNS Domain: chris.vpn.heusel.eu ~.
$ resolvectl query --cache=NO vpn.heusel.eu
vpn.heusel.eu: 49.12.6.160 -- link: tailscale0
(christian.heusel.eu)
# extra records
$ resolvectl query --cache=NO grafana.vpn.heusel.eu
grafana.vpn.heusel.eu: 100.64.0.6 -- link: tailscale0
# node
$ resolvectl query --cache=NO scotty-the-fifth.chris.vpn.heusel.eu
scotty-the-fifth.chris.vpn.heusel.eu: 100.64.0.6 -- link: tailscale0
# beta1
$ resolvectl status tailscale0
Link 8 (tailscale0)
Current Scopes: DNS
Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 100.100.100.100
DNS Servers: 100.100.100.100
DNS Domain: vpn.heusel.eu ~.
$ resolvectl query --cache=NO vpn.heusel.eu
vpn.heusel.eu: Name 'vpn.heusel.eu' not found
# extra records
$ resolvectl query --cache=NO grafana.vpn.heusel.eu
grafana.vpn.heusel.eu: 100.64.0.6 -- link: tailscale0
# node
$ resolvectl query --cache=NO scotty-the-fifth.chris.vpn.heusel.eu
scotty-the-fifth.chris.vpn.heusel.eu: Name 'scotty-the-fifth.chris.vpn.heusel.eu' not found
So this means apparently it now sets the "DNS Domain" to a different value, but I'm not sure whether that causes the issue π€
Since it might be of interest, here is the output of my DNS config:
dns_config:
override_local_dns: true
nameservers:
- 8.8.8.8
restricted_nameservers:
fritz.box:
- 192.168.71.5
domains: []
extra_records:
- name: "grafana.vpn.heusel.eu"
type: "A"
value: "100.64.0.6"
- name: "prometheus.vpn.heusel.eu"
type: "A"
value: "100.64.0.6"
- name: "alertmanager.vpn.heusel.eu"
type: "A"
value: "100.64.0.6"
- name: "repo.vpn.heusel.eu"
type: "A"
value: "100.64.0.6"
magic_dns: true
base_domain: vpn.heusel.eu
Also @kradalby thanks for looking into this, this is very much appreciated! β€οΈ
Possible duplicates/related issues given my latest findings: #2029 #2026
ah yes, a DNS issue might be the potential culprit, while waiting for a reply I started to write up some clearly missing DNS tests, so will continue with that then. I'll post when I have an update, maybe on either of those two issues.
I think #2034 addresses this, would it be possible for you to help me test it? would be great to avoid another bad release like beta1.
Binary is available here: https://github.com/juanfont/headscale/actions/runs/10195837541?pr=2034
@kradalby thanks for working on a fix! π€
Except for the fact that I had to rename from dns_config
to dns
the mentioned PR did not fix the issues π
Also there was no error about the rename from restricted_nameservers
to split
, but setting it also did not help, same for the addition of global
in the nameservers
directive π€
Except for the fact that I had to rename from
dns_config
todns
the mentioned PR did not fix the issues π
Yes, sorry, thats part of the PR, I have one theory looking at your config, can you try setting a dns.base_name
different from the DNS name you use for headscale? so magicdns.vpn.heusel.eu
as base_name
and keep vpn.heusel.eu
for the headscale?
Also there was no error about the rename from
restricted_nameservers
tosplit
, but setting it also did not help, same for the addition ofglobal
in thenameservers
directive π€
Did you not get any warnings at the beginning of your logs? I've made it so if not replaced it should fatal now.
To test, you can set the dns.use_username_in_magic_dns
to true
, which will be removed, but it will temp give you back the username in the dns, which should have the same effect.
This might be a good thing that we discovered, that having the same base_name and headscale dns name will no longer be possible due to how Tailscale takes over the DNS.
For the record, in Tailscale upstream, this is the same behaviour:
so by headscale injecting username stuff, it did not break before, but that prevents us from achieving some other things, so it sadly has to go.
@christian-heusel did you have an opportunity to test this?
Sorry I forgot about this, will test and report soon!
To test, you can set the dns.use_username_in_magic_dns to true, which will be removed, but it will temp give you back the username in the dns, which should have the same effect.
This makes the three types of queries from above work again π ππ»
Regarding https://github.com/juanfont/headscale/issues/2025#issuecomment-2264760872:
When unsetting the previously set dns.use_username_in_magic_dns
and setting the base_name
as requested it also works as expected ππ»
Did you not get any warnings at the beginning of your logs? I've made it so if not replaced it should fatal now.
Maybe I'm testing this wrong, but I dont get any warnings/fatal versions with the latest version of your branch and the following DNS config snippet (which I have verified to be the active one inside of the confainer by running docker compose exec headscale cat /etc/headscale/config.yaml
):
dns:
override_local_dns: true
nameservers:
# global:
- 8.8.8.8
restricted_nameservers:
# split:
fritz.box:
- 192.168.71.5
domains: []
magic_dns: true
base_domain: magicdns.vpn.heusel.eu
Instead I'm being warned about a key I don't even have set:
WARN: The "dns.use_username_in_magic_dns" configuration key is deprecated and has been removed. Please see the changelog for more details.
Edit: reverted bogus comment here, I tried to connect against a node of mine that went offline for unbeknownst reaons. π
Maybe I'm testing this wrong, but I dont get any warnings/fatal versions with the latest version of your branch and the following DNS config snippet (which I have verified to be the active one inside of the confainer by running
docker compose exec headscale cat /etc/headscale/config.yaml
):
hmm, I you wont really get any errors/warnings for setting the wrong keys, for example dns.nameservers
isnt checked, while dns_config.nameservers
is checked. I suppose we could do it, but there is no good way in cobra to cover all cases, only the ones we can think about.
At the moment it will only warn if you have the old set, and not the new. if you mix, it wont detect it.
Is this a support request?
Is there an existing issue for this?
Current Behavior
Expected Behavior
The builtin DERP keeps on working with the update, I have also configured and used this setup for a long time now.
Steps To Reproduce
headscale
to version v0.23.0-beta1I hope that I did not miss anything in the changelogs, but to me it looks like there was no config changes etc. required to keep this working between the two relevant versions.
Environment
Runtime environment
Although both of the above are the case the DERP server is just publicly accesible:
Anything else?
The startup log claims that I do not have any DERP's configured:
and yet this is my derp config (snippet), which used to work with the previous versions: