tinkerbell / hook

In-memory Operating System Installation Environment for Executing Tinkerbell Workflows
Apache License 2.0
105 stars 51 forks source link

Wrong /dev/null permission making ubuntu jammy deployment impossible #142

Closed lanquarden closed 9 months ago

lanquarden commented 2 years ago

When switching the sandbox project to deploy ubuntu jammy, running apt update with the cexec action fails due to not having permission to write to /dev/null

Expected Behaviour

Being able to run apt update when deploying the ubuntu jammy image, permissions on /dev/null need to be 666 for apt update to work.

Current Behaviour

apt update in cexec action fails when deploying the ubuntu jammy image because it can't write to /dev/null, permissions on /dev/null are 660 and apt update doesn't work

Possible Solution

First I updated cexec container to mount /dev as rw so I could update the permissions from the template. Then I switched to a more general approach where I updated hook-docker to set the correct permissions:

hook-docker/main.go
────────────────────────────────────────────────────────────────────────────────────────────────────────────

──────────────────┐
31: func main() { │
──────────────────┘
 31 ⋮ 31 │    fmt.Println("Starting Tink-Docker")
 32 ⋮ 32 │    go rebootWatch()
 33 ⋮ 33 │
    ⋮ 34 │    fmt.Println("Make /dev/null writeable for all users!")
    ⋮ 35 │    cmd := exec.Command("chmod", "666", "/dev/null")
    ⋮ 36 │    cmd.Stdout = os.Stdout
    ⋮ 37 │    cmd.Stderr = os.Stderr
    ⋮ 38 │    err := cmd.Run()
    ⋮ 39 │    if err != nil {
    ⋮ 40 │        panic(err)
    ⋮ 41 │    }
    ⋮ 42 │
 34 ⋮ 43 │    // Parse the cmdline in order to find the urls for the repository and path to the cert
 35 ⋮ 44 │    content, err := ioutil.ReadFile("/proc/cmdline")
 36 ⋮ 45 │    if err != nil {

──────────────────┐
74: func main() { │
──────────────────┘
 65 ⋮ 74 │    }
 66 ⋮ 75 │
 67 ⋮ 76 │    // Build the command, and execute
 68 ⋮    │    cmd := exec.Command("/usr/local/bin/docker-init", "/usr/local/bin/dockerd")
    ⋮ 77 │    cmd = exec.Command("/usr/local/bin/docker-init", "/usr/local/bin/dockerd")
 69 ⋮ 78 │    cmd.Stdout = os.Stdout
 70 ⋮ 79 │    cmd.Stderr = os.Stderr
 71 ⋮ 80 │    err = cmd.Run()

While I got it working I don't know if there are better ways to solve this problem.

Steps to Reproduce (for bugs)

  1. Try deploying ubuntu jammy image with the sandbox

Context

Your Environment

jacobweinstock commented 1 year ago

Hey @lanquarden, thanks for reporting this. I definitely agree that /dev/null should have 666 permissions. Weird thing is that I am able to reproduce this on one machine, but on another machine I cannot. i.e. installing the machine1 with Ubuntu Jammy works successfully. In the successful install, /dev/null has 660 permissions. So, it makes me think that there's possibly an environmental issue external to vagrant that is causing issues. I'm definitely not certain though. I will continue to investigate and update here.

lanquarden commented 1 year ago

Maybe the issue resides in apt update, these commands are invoked with root user so it shouldn't complain about the 660 permission, unless apt is running some parts as a different user... Maybe this behavior is linked to the environment somehow. I didn't explore that route as I had found a workaround.

Cajga commented 9 months ago

@jacobweinstock we run into the same with EKS Anywhere (we opened an AWS case as well as we have EKS-A Subscription).

In fact, @lanquarden is right and by default apt is using the _apt user for calling out to apt-key (and to many other "sandboxed" tasks like downloading a package etc.) that would write to /dev/null.

The main issue here is that with the current hook kernel, when you mount /dev with devtmpfs /dev/null gets created with 0660 (and this happens inside the cexec container as well thanks to this): image

While using Ubuntu 22.04 for example it gets 0666:

root@kls107:~# mkdir /mnt/dev
root@kls107:~# mount -r  -t devtmpfs none /mnt/dev
root@kls107:~# ll /mnt/dev/null
crw-rw-rw- 1 root root 1, 3 Feb  6 09:29 /mnt/dev/null
root@kls107:~# mount|grep /mnt/dev
none on /mnt/dev type devtmpfs (ro,relatime,size=65716676k,nr_inodes=16429169,mode=755,inode64)
root@kls107:~# 

We would love to get a fix (or any applicable workaround that does not require custom build cexec container/hook os) into EKS Anywhere.

Cajga commented 9 months ago

I've sent a PR to cexec that implements a "harmless workaround" and allows the call of apt update in the following way:

CMD_LINE: echo 'nameserver IPOFYOURNAMESERVER' > /etc/resolv.conf && export NEEDRESTART_SUSPEND=true && apt -y update && apt install -y nfs-common open-iscsi....
jacobweinstock commented 9 months ago

Hey @lanquarden and @Cajga , this PR should resolve this. #200 . Once its landed, or before if you want to build from my branch, would you mind testing to validate its working for you both?

Cajga commented 9 months ago

@jacobweinstock , thank you for the quick fix.

I can confirm that using the hook image fixed the /dev/* permissions inside hook. I can also confirm that the /dev/null has 0666 permissions inside the chrooted cexec action ( this surprised me as it is mounting devtmpfs (that had permission issues) instead of bind mounting /dev from hook).