tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
19.22k stars 1.5k forks source link

Very high CPU usage when running tailscale serve in Tailscale Docker container with TS_USERSPACE=false #11372

Closed Bostrolicious closed 3 weeks ago

Bostrolicious commented 8 months ago

What is the issue?

When running the Tailscale docker container with TS_USERSPACE=false and a reverse proxy through tailscale serve (configured in a json file via TS_SERVE_CONFIG), the CPU usage goes through the roof. If I don't use TS_SERVE_CONFIG, and instead start the reverse proxy from inside the container with tailscale serve --bg localhost:8080, CPU usage is initially normal, but if I turn serve off with tailscale serve --https=443 off it spikes again. This leads me to think it could be related to #10693. Looking at a flamegraph of prof.cpu.gz , it looks very similar to what was shown in that issue.

When commenting out TS_USERSPACE=false (so defaulting to true), I don't encounter this issue regardless of what I do with serve.

I should perhaps also note that I've not had any problems reaching the services I've tried through the reverse proxy, so I don't think they're misconfigured.

Steps to reproduce

Start a docker compose stack with the following docker-compose.yml. Please note that I've tried this with other services than stirling-pdf, and even with only tailscale in the stack. stirling-pdf was just the one I was using when troubleshooting.

version: '3.3'
services:
  stirling-pdf:
    image: frooodle/s-pdf:latest
    network_mode: service:tailscale
    depends_on:
      - tailscale
    environment:
      - DOCKER_ENABLE_SECURITY=false
    restart: unless-stopped
  tailscale:
    image: tailscale/tailscale:latest
    hostname: stirling-test1
    environment:
      - TS_AUTHKEY=REDACTED
      - TS_EXTRA_ARGS=--advertise-tags=tag:container
      - TS_SERVE_CONFIG=/config/stirling-pdf.json
      - TS_STATE_DIR=/var/lib/tailscale
      - TS_USERSPACE=false
    volumes:
      - ./ts-stirling-pdf/state:/var/lib/tailscale
      - ./ts-stirling-pdf/config:/config
      - /dev/net/tun:/dev/net/tun
    cap_add:
      - net_admin
      - sys_module
    restart: unless-stopped

with the following in stirling-pdf.json:

{
    "TCP": {
      "443": {
        "HTTPS": true
      }
    },
    "Web": {
      "${TS_CERT_DOMAIN}:443": {
        "Handlers": {
          "/": {
            "Proxy": "http://127.0.0.1:8080"
          }
        }
      }
    },
    "AllowFunnel": {
      "${TS_CERT_DOMAIN}:443": false
    }
  }

Alternatively, skip the serve json, and run tailscale serve --bg localhost:8080, followed by tailscale serve --https=443 off inside the container.

Are there any recent changes that introduced the issue?

I've only started using the docker container, so I don't know if this has been an issue for long.

OS

Linux

OS version

Pop!_OS 22.04 (on the host)

Tailscale version

1.60.0 (in the container), 1.60.1 (host)

Other software

No response

Bug report

BUG-1706f1430b8226adf7c15d58be27a881ea85d287d4dd3155ef8bfc36c9c9595e-20240308101954Z-df3f1bc918054ede

joelheaps commented 7 months ago

Just jumping on to say I'm experiencing the same thing on a Nixos host using a similar service + Tailscale sidecar architecture. The host running the containers also runs Tailscale (in case it's relevant). Using userspace mode is an acceptable workaround for me for now, but I'm willing to change it back and gather more info if help is needed.

Bug report

BUG-cf2d45da7a773d3895e766c98b3e87e1b4b91ff679f80d31caf1fa8af9687e01-20240406165402Z-86eb3d2d40143636

OS

NixOS 23.11 (Tapir)

Tailscale version

1.62.1
  tailscale commit: 2827330c6adacd9a67940621b5f05b589527c550
  go version: go1.22.1

Docker Compose excerpt

  ts:
    image: tailscale/tailscale:latest
    environment:
      - TS_HOSTNAME=pdf
      - TS_AUTHKEY=tskey-auth-kCZgqhVSUp11CNTRL-eWvZLwPDnANEAphXiUcRANCzBZY4CEa82
      - TS_EXTRA_ARGS=--advertise-tags=tag:personal-webapps
      - TS_STATE_DIR=/var/lib/tailscale
      - TS_USERSPACE=false
      - TS_SERVE_CONFIG=/ts-serve.json
    volumes:
      - ./ts-serve.json:/ts-serve.json
      - ts-state:/var/lib/tailscale
      - /dev/net/tun:/dev/net/tun
    cap_add:
      - net_admin
      - sys_module
    restart: unless-stopped

Tailscale serve config

{
    "TCP": {
        "443": {
            "HTTPS": true
        },
        "80": {
            "HTTP": true
        }
    },
    "Web": {
        "portainer1.my-tailnet.ts.net:443": {
            "Handlers": {
                "/": {
                    "Proxy": "http://127.0.0.1:9000"
                }
            }
        },
        "portainer1.my-tailnet.ts.net:80": {
            "Handlers": {
                "/": {
                    "Proxy": "http://127.0.0.1:9000"
                }
            }
        }
    }
}

My poor CPU

All cores idle around 1-2% when userspace mode is used. image

Arragon5xpwm commented 6 months ago

On Synology NAS (even those with weak CPU) Tailscale can be installed but no newer version than 1.38.4 is available. So I switched latest tag to v1.38.4 and CPU usage is a <1% for the same task that 1.62 takes 100%. I wonder if Tailscale is aware of the problem and if this is why there is no newer package for Synology yet.

Bostrolicious commented 3 weeks ago

I think this may have been fixed. I'm no longer seeing high CPU usage with TS_USERSPACE=false. Anyone able to confirm?

It seems the related issue I mentioned initially may also be fixed (https://github.com/tailscale/tailscale/issues/10693#issuecomment-2260870220), so they likely had the same cause.

yomaq commented 3 weeks ago

Can confirm, running Tailscale Serve with userspace set to false doesn't run the high CPU usage it was before. Looks to be fixed to me.

Bostrolicious commented 3 weeks ago

I'll close the issue then. I'm encountering other problems with TS_USERSPACE=false that stop me from using it anyway for now, but I'll open a separate issue for those. Great to see this fixed!