tailscale / tailscale

The easiest, most secure way to use WireGuard and 2FA.
https://tailscale.com
BSD 3-Clause "New" or "Revised" License
18.48k stars 1.42k forks source link

Tailscale SSH does not work on Google Cloud Run, `setgroups` fails #8394

Open mattirantakomi opened 1 year ago

mattirantakomi commented 1 year ago

What is the issue?

I tried to open ssh connection from my own computer (Ubuntu 22.04 LTS) to a container running Tailscale on Google Cloud Run service but connection failed with "operation not permitted" error message.

masa@masa-x1$ ssh www@100.127.204.106
The authenticity of host '100.127.204.106 (100.127.204.106)' can't be established.
ED25519 key fingerprint is SHA256:nXser+W1F4BwyF12llmr2OsTh78jYJa8zZDV3WP9W4M.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '100.127.204.106' (ED25519) to the list of known hosts.
operation not permitted
Connection to 100.127.204.106 closed.

From Google Cloud Run logs I can see that the following command exited with code 1:

starting pty command: [/usr/sbin/tailscaled be-child ssh --uid=1000 --gid=1000 --groups=1000 --local-user=www --remote-user=mattirantakomi@github --remote-ip=100.111.215.88 --has-tty=true --tty-name=pts/0 --shell --login-cmd=/usr/bin/login --cmd=/bin/bash -- -l]

I tried to debug with strace and found out that the failing syscall is setgroups.

...

geteuid()                               = 1000
geteuid()                               = 1000
geteuid()                               = 1000
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=185, si_uid=1000} ---
rt_sigreturn({mask=[]})                 = 1000
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=185, si_uid=1000} ---
rt_sigreturn({mask=[]})                 = 1000
getegid()                               = 1000
futex(0xc000060548, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x15fbd08, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
setgroups(1, [1000])                   = -1 EPERM (Operation not permitted)
futex(0x15fc0c0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc000060548, FUTEX_WAKE_PRIVATE, 1) = 1
geteuid()                               = 1000
getgroups(0, NULL)                      = 0
write(2, "operation not permitted\n", 24operation not permitted
) = 24
exit_group(1)                           = ?
+++ exited with 1 +++

As far as I know this bug should have been already fixed on https://github.com/tailscale/tailscale/pull/6904 but I'm not sure about it.

Steps to reproduce

git clone git@github.com:mattirantakomi/cloudrun-nginx-test.git
gcloud run deploy cloudrun-nginx-test \
--project <YOUR PROJECT ID> \
--region europe-north1 \
--service-account <YOUR SERVICE ACCOUNT> \
--set-env-vars TAILSCALE_AUTHKEY="authkey-1234" \
--execution-environment gen2 \
--min-instances=1 --max-instances=1 \
--source .

I tried both gen1 and gen2 execution environments, neither doesn't work.

Are there any recent changes that introduced the issue?

-

OS

Linux

OS version

Ubuntu 22.04 LTS

Tailscale version

1.42.0

Other software

No response

Bug report

BUG-0937c81c9c29b37dc74221ea0bec337231bfcf36999b0a0ff752edde9aa2d8f2-20230621061253Z-1f619bfeaef05cc7

bradfitz commented 1 year ago

cc @maisem @andrew-d

DentonGentry commented 1 year ago

https://github.com/tailscale/tailscale/blob/ffaa6be8a4d84f3c5595328a44f3df2a8cf92e7f/ssh/tailssh/incubator.go#L792-L794 This will ignore the error if tailscaled is running as non-root and the groups were already correct. On Google Cloud Run I believe it will be running as root. GCR's container runtime is unique and not a regular Linux container, but it does allow its processes to run with UID zero (the container makes sure this doesn't let the contained processes do anything harmful).

mattirantakomi commented 1 year ago

On Google Cloud Run container will run with that UID which is defined in Dockerfile. I can confirm that Tailscale SSH is working fine on Cloud Run when container is running with root privileges.

I think that groupsMatchCurrent(groupIDs) on incubator.go line 792 is not working as it should as tailscaled is trying to set groups even gids are already matching.

As you can see from the opening post, the failing syscall is: setgroups(1, [1000]) = -1 EPERM (Operation not permitted)

And that is the reason why command exits with code 1.

bradfitz commented 1 year ago

@mattirantakomi, the code is:

func setGroups(groupIDs []int) error {
//...
    err := syscall.Setgroups(groupIDs)
    if err != nil && os.Geteuid() != 0 && groupsMatchCurrent(groupIDs) {
        // If we're not root, ignore a Setgroups failure if all groups are the same.
        return nil
    }
    return err
}

func groupsMatchCurrent(groupIDs []int) bool {
    existing, err := syscall.Getgroups()
    if err != nil {
        return false
    }
    if len(existing) != len(groupIDs) {
        return false
    }
    groupIDs = slices.Clone(groupIDs)
    sort.Ints(groupIDs)
    sort.Ints(existing)
    return slices.Equal(groupIDs, existing)
}

So it should first fail to set, but then see you're non-root, but then see they match (groupsMatchCurrent) and thus succeed.

What are your current groups in a Google Cloud Run container? Is syscall.Getgroups also returning permission denied?

mattirantakomi commented 1 year ago

Full strace log attached: tailscale_strace.txt

Test user "www" with uid 1000 belongs to group "www" with gid 1000.

www@localhost:~$ id
uid=1000(www) gid=1000(www) groups=1000(www)