tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
16.83k stars 1.27k forks source link

FR: Wireguard port selection for proxies spun up by tailscale operator #11908

Open zhiling-liftoff opened 2 weeks ago

zhiling-liftoff commented 2 weeks ago

What are you trying to do?

I'm currently using tailscale operator to route traffic between 2 kubernetes clusters; the clusters' nodes have fairly restrictive firewall rules, and inevitably connections have to go through a DERP relay. For some of our services unfortunately this means that timeouts and packet drops occur during periods of high traffic. Since the wireguard tunnel uses a randomly assigned port there isn't really a way to set up the firewall rules to allow inbound traffic to the proxy.

How should we solve this?

It would be great to maybe specify a range of ports for tailscale operator to select from, so we can set up firewall rules accordingly. Alternatively being able to specify the wireguard port as part of the service annotation would also be helpful.

What is the impact of not solving this?

Not sure if there are any workarounds other than rate-limiting my own services.

If there are any other ways around this issue I'd appreciate any input, as well

Anything else?

No response

irbekrm commented 2 weeks ago

Hi @zhiling-liftoff , thanks for opening the issue.

We're currently looking at how to support direct connections to operator proxies better and document what config knobs are needed for various CNIs etc.

the clusters' nodes have fairly restrictive firewall rules, and inevitably connections have to go through a DERP relay

Would you be able to share a bit more about your setup (what Kubernetes distribution you are using, what CNI, are your nodes with public or private IPs and what are those proxies (i.e Tailscale Ingress or Services))? There could be a number of reasons why you are not getting direct connections, wireguard port selection alone won't necessarily give you direct connections.

See i.e https://github.com/tailscale/tailscale/issues/3822 https://github.com/tailscale/tailscale/issues/11427

jaxxstorm commented 2 weeks ago

As @irbekrm said, there are a lot of obstacles to getting direct connections in AWS. However, it is possible to set a specific port right now, by setting the PORT environment variable.

If you want to guarantee direct connections in EKS, you must do the following:

This removes all NAT obstacles from Tailscale, and will guarantee direct connections in almost all circumstances.

zhiling-liftoff commented 2 weeks ago

@irbekrm thanks for the response. I'm assuming I only need to ensure accessibility on the part of the ingress service, and not the egress -- in which case, I'm on AWS, with aws-cni. We have both private and public nodepools available. I'm using tailscale services.

Running tailscale netcheck gives me the following:

    * UDP: true
    * IPv4: yes, <redacted>
    * IPv6: no, but OS has support
    * MappingVariesByDestIP: true
    * HairPinning: false
    * PortMapping:
    * Nearest DERP: New York City

and tailscale status confirms that connections with the downstream clients is via DERP.

I was attempting to use proxyClass to set TS_TAILSCALED_EXTRA_ARGS to --port=41234 for the proxy for the specific workload i need direct connections to, but it seems this feature is not yet released.

@jaxxstorm Thanks for the suggestion! CMIIW I'm not sure if the knobs are available to do this for the operator-maintained proxies as of now.

irbekrm commented 2 weeks ago

@zhiling-liftoff if you are running with aws-cni, I would start with setting AWS_VPC_K8S_CNI_RANDOMIZESNAT to none https://github.com/aws/amazon-vpc-cni-k8s?tab=readme-ov-file#cni-configuration-variables and see if you can get direct connections then. (See https://github.com/tailscale/tailscale/issues/3822 for more context)

zhiling-liftoff commented 2 weeks ago

@irbekrm it seems like the linux bug that that configuration was supposed to fix isn't fixed yet, I'm not sure if I want to risk setting AWS_VPC_K8S_CNI_RANDOMIZESNAT away from prng :/ If this indeed root cause of the issue, does that mean even the changes suggested by @jaxxstorm would not work in enabling direct connections?