tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
17.05k stars 1.3k forks source link

Tailscale >= 1.66.0 breaks DNS name resolution in Docker containers on Linux #12108

Closed ferrarimarco closed 1 week ago

ferrarimarco commented 2 weeks ago

What is the issue?

After upgrading to Tailscale 1.66.0 or 1.66.1, DNS name resolution breaks inside newly created Docker containers.

I'm experiencing this on a Debian 11 host.

This doesn't occur on the same machine with Tailscale 1.64.0

Thanks for your support!

Steps to reproduce

  1. Install Tailscale 1.64.0 on the Linux machine: apt install tailscale=1.64.0
  2. Start a Docker container: docker run --rm -it debian:12
  3. Ping an external service from inside the container: ping google.com -> WORKS
  4. Stop the Docker containr: exit
  5. Install Tailscale 1.66.0 or 1.66.1 on the Linux machine: apt install tailscale=1.66.1
  6. Start a Docker container: docker run --rm -it debian:12
  7. Ping an external service from inside the container: ping google.com -> Cannot resolve name

Are there any recent changes that introduced the issue?

Upgrade Tailscale from 1.64.0 to 1.66.1 using the system package manager.

OS

Linux

OS version

Debian 11

Tailscale version

1.66.1

Other software

Docker version 26.1.2, build 211e74b

Bug report

BUG-c7f85e39b9c72709e39b2f333de28e65a0d5875a2e351ab851dae9f245cea271-20240512175341Z-c93cbe3beb3e255b

awly commented 2 weeks ago

1.66 added stateful filtering for packets going via the tailscale0 interface, see https://tailscale.com/changelog#2024-05-08. But that should not affect packets from docker containers to the Internet.

Another notable thing is that this node is also a subnet router for the 10.0.0.0/8 range. Again, this should not affect docker containers, but noting it for others reading the issue.

Some things to check are:

chennin commented 2 weeks ago

In the container, what does getent hosts google.com print, on both versions?

I also ran in to this (or maybe this is a different problem), in my case my tailnet address resolved to ipv6, but the container did not have an ipv6 address. One way to solve it is by enabling ipv6 for the container

danielrusco commented 2 weeks ago

I had the same issue, which showed up as Service Widgets from my Homepage instance not connecting to their source (e.g. Proxmox, Jellyfin, etc).

The problem was solved by running tailscale up --stateful-filtering=false on my host machine.

miroslavbucek commented 2 weeks ago

I had the same problem which he solved: tailscale set --stateful-filtering=false

You owe me a good night's sleep :-(

ylbeethoven commented 2 weeks ago

+1

--stateful-filtering=false solve the problem

masterwishx commented 2 weeks ago

i have same issue but for Magic DNS host names . tailscale set --stateful-filtering=false fixed issue with ips but not for node host names

ferrarimarco commented 2 weeks ago

1.66 added stateful filtering for packets going via the tailscale0 interface, see https://tailscale.com/changelog#2024-05-08. But that should not affect packets from docker containers to the Internet.

Another notable thing is that this node is also a subnet router for the 10.0.0.0/8 range. Again, this should not affect docker containers, but noting it for others reading the issue.

Correct, it's a subnet router for that range

Some things to check are:

  • check whether tailscale set --stateful-filtering=false on 1.66.1 resolves the issue

It works if I update to 1.66.1, and disable stateful filtering, for both internal and external names. So, stateful filtering is likely interfering with this?

  • check if your docker configuration uses CGNAT IPs (in the 100.64.0.0/10 range) by chance? You can run something like this to view the config for all docker networks:
    docker network ls --format '{{.Name}}' | xargs docker network inspect

I checked, and it doesn't appear to be the case. That range is not in use by any network that Docker creates.

In the container, what does getent hosts google.com print, on both versions?

Tailscale 1.64.0: 2a00:1450:4002:402::200e google.com Tailscale 1.66.1 (before disabling stateful filtering): Timeout Tailscale 1.66.1 (after disabling stateful filtering): 2a00:1450:4002:411::200e google.com

Thanks for your support!

ferrarimarco commented 2 weeks ago

From the https://tailscale.com/security-bulletins#ts-2024-005 page:

The attack only works on a LAN because:

  • it relies on next-hop routing, which only works in a LAN
  • destination IPs are in the subnet router's approved range, or in the CGNAT range 100.64.0.0/10, which are not routable over the Internet.

This appears to be the case because both the resolvers I'm using are in the 10.0.0.0/8 range. Additionally, one of those (10.0.0.2) runs in a container.

On Linux packet-forwarding nodes we added stateful packet filtering. This means that these nodes keep track of forwarded connections and only allow return packets for existing outbound connections. Inbound packets that don't belong to an existing connection are dropped.

And from https://tailscale.com/kb/1080/cli:

--stateful-filtering Enable stateful filtering for subnet routers and exit nodes. When enabled, inbound packets with another node's destination IP are dropped, unless they are a part of a tracked outbound connection from that node. Defaults to enabled, but can be disabled for site-to-site networking use cases.

This doesn't appear to work as intended, at least in this case?

awly commented 2 weeks ago

Thanks for the details, I reproduced the issue. Right now, tailscale set --stateful-filtering=false is the only workaround. Running it on a regular node (not exit node and not subnet router) should be safe.

The reason this happens is because Docker will by default use resolv.conf from the host, which points at 100.100.100.100 (served by local tailscaled) as the nameserver. Since this IP lands in the tailscale0 interface, DNS request packets become subject to stateful filtering. And because the source IP is not localhost, but something like 172.17.0.2 (from the docker network range), it's interpreted as "incoming request from another host" and thus rejected.

blueskea commented 2 weeks ago

I had the same problem which he solved: tailscale set --stateful-filtering=false

You owe me a good night's sleep :-(

Thanks, and no need to restart tailscale and docker.

ErebusBat commented 2 weeks ago

Dropping an updated command from what was posted above (https://github.com/tailscale/tailscale/issues/12108#issuecomment-2107156997) to view docker network names / IP ranges if you have jqlang/jq installed:

docker network ls --format '{{.Name}}' | xargs docker network inspect | jq '.[] | [.Name, .IPAM.Config]'

Cursor_and_zsh

masterwishx commented 2 weeks ago

@awly for me MagicDns nodes names still not not working in docker in Unraid and one Ubuntu 20.4 client, other Ubuntu 22.4 client working fine. 3th Ubuntu 22.4 still don't have docker installed. Using headscale + 3 Ubuntu servers + unraid server with plugin. But dns is OK also full MagicDns nodes name are working. host.user.headscale.mysite.com

awly commented 2 weeks ago

@masterwishx just to clarify:

If that's correct, that sounds like a different issue. Can you show the contents of /etc/resolv.conf on the hosts that servers that don't work, and those that do work?

masterwishx commented 2 weeks ago

@masterwishx just to clarify:

  • resolving full names, like host.headscale.mysite.com works
  • resolving short names, like host does not work

If that's correct, that sounds like a different issue. Can you show the contents of /etc/resolv.conf on the hosts that servers that don't work, and those that do work?

Yes all right ,maybe i should open another issue for it , in previous version of tailscale all works fine short host NOT works:

instance-mysite-cloud Ubuntu 20.04.6 LTS /etc/resolv.conf nameserver 127.0.0.53 search vcn02152127.oraclevcn.com masterwishx.headscale.mysite.com

unraidhost:2375 image

image

image

image

When using ip instead its fine :

image



instance-mysite-cloud2 Ubuntu 22.04.4 LTS /etc/resolv.conf

nameserver 127.0.0.53 options edns0 trust-ad search masterwishx.headscale.mysite.com vcn12281520.oraclevcn.com

http://instance-mysite-cloud:9768 for s3 backup in portainer working fine



nasname Unraid 6.12.10 # Generated by rc.inet1 nameserver 9.9.9.9 nameserver 149.112.112.112 nameserver 192.168.0.1

image image

masterwishx commented 1 week ago

Fixed in Ubuntu by : tailscale set - - accept-dns=false then tailscale set - - accept-dns In Unraid it's not helped

masterwishx commented 1 week ago

Fixed in Ubuntu by : tailscale set - - accept-dns=false then tailscale set - - accept-dns In Unraid it's not helped

Forgot to mention that also tryed to make Adguard home as dns in docker then goes back to local dns. So was also reload dns/stundns for resolve.conf

awly commented 1 week ago

1.66.4 is available and disables stateful filtering by default. We discussed doing more clever things, like detecting container runtimes and allowlisting their interfaces in our iptables/nftables rules, but that gets pretty hairy. Instead, @maisem pointed out that other mitigations for https://tailscale.com/security-bulletins#ts-2024-005 cover most cases except for autogroup:danger-all, so stateful filtering is not strictly required except for that case.

If you run 1.66.0-1.66.3, you still need to manually disable stateful filtering with tailscale set --stateful-filtering=false. If you install a brand new 1.66.4, or upgrade from 1.64 or older versions, stateful filtering will be off by default.

ferrarimarco commented 1 week ago

Thanks for the update!