Open adoolaard opened 3 months ago
In the meantime, I have also installed Headscale bare metal (in a Debian VM in Proxmox). I am experiencing the same issue here. I can connect my Mac and iPhone, but not Linux (via the tailscale up command or the Tailscale Docker container).
Did you check this:
We see this occasionally as well.
Normally restarting the headscale instance a couple of times fixes it.
This only happens after we update the routes of a subnet router, and only subnet routers are affected. Other clients can connect fine. (We are running the subnet routers in docker containers as well)
The tailscale up command fails with no output, It just times out https://github.com/tailscale/tailscale/blob/ac574d875c7bf6ce16e744b47ce94b74622d550b/cmd/containerboot/main.go#L704
We're unable to find any relevant logs in headscale indicating an error. In fact, headscale logs that it authenticates the node correctly
Our tailscale client containers are configured as such (using container config on GCP GCE)
- name: test-container
image: tailscale/tailscale:v1.56.1@sha256:196044d4d339f10bef9bdd639504fb359afbbb6486608f2bc9851aa1f2014e0b
env:
- name: TS_EXTRA_ARGS
value: --login-server https://{headscale} --reset
- name: TS_ROUTES
value: {list of routes}
- name: TS_USERSPACE
value: 'false'
- name: TS_STATE_DIR
value: /var/headscale
securityContext:
privileged: true
Here are the logs on headscale's side regarding the particular node
I wonder if it's an issue of awkward timing where a machine is declared to be offline while it is trying to authenticate
Some info on timing:
At 2024-04-04 10:14:50.000 headscale reports "Machine successfully authorized" At 2024-04-04 10:14:51.000 headscale reports "Machine successfully authorized" At 2024-04-04T14:14:51.078128612Z subnet router node reports "RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false" At 2024-04-04 10:15:49.845 subnet router node reports "failed to auth tailscale: failed to auth tailscale: tailscale up failed: signal: killed" {subnet router docker container restarts} At 2024-04-04 10:15:50.000 headscale reports "Machine successfully authorized" At 2024-04-04 10:15:50.454 subnet router node reports "RegisterReq: got response; nodeKeyExpired=false, machineAuthorized=true; authURL=false" At 2024-04-04 10:15:59.000 headscale reports "Machine successfully authorized" At 2024-04-04 10:16:50.106 subnet router node reports "failed to auth tailscale: failed to auth tailscale: tailscale up failed: signal: killed"
This auth + timeout behaviour loops indefinitely until we restart headscale a couple of times.
So kind of interesting that headscale reports "machine successfully authorized" twice for each auth attempt
Between that and the fact that this only happens to us intermittently, it feels like some kind of race condition
I have the same problem as @adoolaard . Connection from Mac and iOS device is fine, connection from linux is fine on the server side:
2024-05-25T08:58:05+02:00 DBG Registering machine from API/CLI or auth callback expiresAt=<nil> nodeKey=[iYXXZ] registrationMethod=cli userName=simonszu
2024-05-25T08:58:05+02:00 DBG Registering machine machine=naugol machine_key=b5416c5da860668ded90885d6d7a283aec8bf96dcb427f9f70f304f273babc24 node_key=8985d7673375cec652fb5956a3010419a5f6056cf9ac0dee63362a132ecf9204 user=simonszu
2024-05-25T08:58:05+02:00 INF unary dur=21.093901 md={":authority":"/var/run/headscale/headscale.sock","content-type":"application/grpc","user-agent":"grpc-go/1.54.0"} method=RegisterMachine req={"key":"nodekey:8985d7673375cec652fb5956a3010419a5f6056cf9ac0dee63362a132ecf9204","user":"simonszu"} service=headscale.v1.H
eadscaleService
2024-05-25T08:58:05+02:00 DBG go/src/headscale/hscontrol/protocol_common.go:665 > Client is registered and we have the current NodeKey. All clear to /map machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF go/src/headscale/hscontrol/protocol_common.go:703 > Machine successfully authorized machine=naugol noise=true
2024-05-25T08:58:05+02:00 DBG A machine is entering polling via the Noise protocol handler=NoisePollNetMap machine=naugol
2024-05-25T08:58:05+02:00 DBG Client map request processed handler=PollNetMap machine=naugol noise=true omitPeers=true readOnly=false stream=false
2024-05-25T08:58:05+02:00 INF Client sent endpoint update and is ok with a response without peer list handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 DBG A machine is entering polling via the Noise protocol handler=NoisePollNetMap machine=naugol
2024-05-25T08:58:05+02:00 DBG Client map request processed handler=PollNetMap machine=naugol noise=true omitPeers=false readOnly=false stream=true
2024-05-25T08:58:05+02:00 INF Client is ready to access the tailnet handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF Sending initial map handler=PollNetMap machine=naugol noise=true
2024-05-25T08:58:05+02:00 INF Notifying peers handler=PollNetMap machine=naugol noise=true
However, the client side does not seem to get the callback/response, and therefore the login command hangs indefinitely. No idea why, any help would be appreciated.
Bug description
I have successfully installed Headscale in a Docker container running on a Proxmox LXC container. I opened ports 80, 443, and 8080 in the Proxmox firewall, forwarding them to port 8080 on the Headscale container.
I can successfully connect to Headscale using the Tailscale apps on my iPhone and Macbook. However, I am unable to connect from:
A Tailscale Docker container running on the same LXC container as Headscale. A new LXC container where I installed Tailscale with apt install tailscale and ran tailscale up --login-server https://headscale.mydomain.com:443. When attempting to connect from these containers, nothing happens for 15 minutes before the command times out. I have tried with and without the --authkey option.
For the Docker container, I have some logs, but they are not helpful in understanding the issue. I have tried using both the stable version of Headscale and "v0.23.0-alpha5." My iPhone and Macbook connect successfully with both versions, but Linux and Docker connections fail.
Environment
What I have tried:
Opened the necessary ports in the Proxmox firewall. Used both stable and alpha versions of Headscale. Tried connecting with and without the --authkey option. Checked the Docker container logs (limited information).
Docker Compose configuration:
Docker logs:
I have searched for similar issues in the existing tickets and documentation but could not find a solution. Any help would be greatly appreciated!