kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.13k stars 97 forks source link

AuroraBoot ISO creation fails on MacOS (image pull failure) #1848

Open clanktron opened 1 year ago

clanktron commented 1 year ago

Trying to build an offline ISO fails due to an image pull failure. I don't see any difference between what I've been doing and what the docs say so I'm at a bit of a loss. Apologies if this is just some environment issue. The build succeeds when using container_image=quay.io/kairos/core-ubuntu-22-lts-k3s:v1.26.4-k3s1 or any other kairos prebuilt image from quay.

I've tried the below workflow on an arch vm, an ubuntu 22.04 vm, and my macOS host; below is the output from the arch vm...the same error occurs on the ubuntu and mac machines (/var/run/docker.sock is replaced with ~/.rd/docker.sock when on mac host).

I've tried running the auroraboot container without the build dir already existing...same issue. I tried using different versions of auroraboot (v0.2.4, v0.2.5, latest) all had the same error. I also tried removing the state dir arg and setting the volume mount to just /tmp instead of /tmp/auroraboot, same issue.

[I] clayton@archbox ~> uname -a
Linux archbox 6.5.4-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 21 Sep 2023 11:06:39 +0000 x86_64 GNU/Linux

[I] clayton@archbox ~/nas (master)> tree
.
├── build
└── config.yaml

1 directory, 1 file

[I] clayton@archbox ~> docker images
REPOSITORY                       TAG       IMAGE ID       CREATED          SIZE
clanktron/nas                    0.1.0     9b5c47d1cb25   50 minutes ago   3.66GB
clanktron/nas                    latest    9b5c47d1cb25   50 minutes ago   3.66GB
clanktron/custom-nas             latest    73d74937b9e1   2 days ago       3.66GB
quay.io/kairos/auroraboot        latest    7f27c63bee49   2 weeks ago      1.27GB
quay.io/kairos/auroraboot        v0.2.5    f749050aca6a   2 months ago     1.15GB
quay.io/kairos/core-rockylinux   v1.5.0    c910af2a9e32   8 months ago     763MB

[N] clayton@archbox ~/nas (master)> docker run -v "$PWD"/config.yaml:/config.yaml \
                                                     -v "$PWD"/build:/tmp/auroraboot \
                                                     -v /var/run/docker.sock:/var/run/docker.sock \
                                                     --rm -ti quay.io/kairos/auroraboot:v0.2.5 \
                                                     --set container_image=docker://clanktron/nas:0.1.0 \
                                                     --set "disable_http_server=true" \
                                                     --set "disable_netboot=true" \
                                                     --cloud-config /config.yaml \
                                                     --set "state_dir=/tmp/auroraboot"

9:07AM INF Pulling container image 'clanktron/nas:0.1.0' to '/tmp/auroraboot/temp-rootfs' (local: true)
9:09AM ERR Failed pulling container image 'clanktron/nas:0.1.0' to '/tmp/auroraboot/temp-rootfs' (local: true): signal: killed
2 errors occurred:
        * signal: killed
        * 'gen-iso' deps container-pull failed

container_image=docker://clanktron/nas also fails...even after pushing to dockerhub the following fail as well:
container_image=clanktron/nas
container_image=clanktron/nas:0.1.0
container_image=docker.io/clanktron/nas
container_image=docker.io/clanktron/nas:0.1.0

clanktron commented 1 year ago

The issue seems to be write access to the mounted directory. The error actually persists regardless of whether it's a custom image or an official kairos one.

Running with volume mount fails:

[N] clayton@wifi-131-179-0-224 ~/tmp> docker run -v "$PWD"/config.yaml:/config.yaml -v "$PWD"/build:/tmp \
                                                              --rm -ti quay.io/kairos/auroraboot \
                                                              --set container_image=quay.io/kairos/core-rockylinux:v2.4.0 \
                                                              --set "disable_http_server=true" \
                                                              --set "disable_netboot=true" \
                                                              --cloud-config /config.yaml \

8:29PM INF Pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/temp-rootfs' (local: false)
8:29PM ERR Failed pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/temp-rootfs' (local: false): exit status 1
2 errors occurred:
        * exit status 1
        * 'gen-iso' deps container-pull failed

[N] clayton@wifi-131-179-0-224 ~/tmp> docker run -v "$PWD"/config.yaml:/config.yaml -v "$PWD"/build:/tmp/auroraboot \
                                                              --rm -ti quay.io/kairos/auroraboot \
                                                              --set container_image=quay.io/kairos/core-rockylinux:v2.4.0 \
                                                              --set "disable_http_server=true" \
                                                              --set "disable_netboot=true" \
                                                              --set "state_dir=/tmp/auroraboot" \
                                                              --cloud-config /config.yaml \

8:29PM INF Pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/auroraboot/temp-rootfs' (local: false)
8:29PM ERR Failed pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/auroraboot/temp-rootfs' (local: false): exit status 1
2 errors occurred:
        * exit status 1
        * 'gen-iso' deps container-pull failed

Running without volume mount succeeds:

[N] clayton@wifi-131-179-0-224 ~/nas (master) [1]> docker run -v "$PWD"/config.yaml:/config.yaml \
                                                                           --rm -ti quay.io/kairos/auroraboot \
                                                                           --set container_image=quay.io/kairos/core-rockylinux:v2.4.0 \
                                                                           --set "disable_http_server=true" \
                                                                           --set "disable_netboot=true" \
                                                                           --cloud-config /config.yaml \

8:12PM INF Pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/temp-rootfs' (local: false)
8:21PM INF Generating iso 'kairos' from '/tmp/temp-rootfs' to '/tmp/iso'

I'm assuming this is some sort of write access error but the ISO fails to build even if I pass the --privileged flag to the container.

Can anyone confirm this is replicable?

clanktron commented 1 year ago

As a side note, it would be really nice if we had some more comprehensive logging for the container pull process in general. When testing the command tends to hang regardless of whether or not the pull is occurring successfully or not. Having some more visibility as to what's happening (network hang, disk write hang, or just a slow pull process) would be very valuable.

clanktron commented 1 year ago

Didn't realize there was a --debug flag. After adding such my suspicions were confirmed.

6:18PM DBG 1.
6:18PM DBG  <prepare-netboot> (background: false)
6:18PM DBG  <prepare-iso> (background: false)
6:18PM DBG  <prepare-temp> (background: false)
6:18PM DBG
6:18PM DBG 2.
6:18PM DBG  <container-pull> (background: false)
6:18PM DBG  <copy-cloud-config> (background: false)
6:18PM DBG
6:18PM DBG 3.
6:18PM DBG  <gen-iso> (background: false)
6:18PM DBG
6:18PM DBG 4.
6:18PM DBG
6:18PM DBG 5.
6:18PM DBG
6:18PM INF Pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/auroraboot/temp-rootfs' (local: false)
6:18PM DBG Output ' INFO    Downloading  quay.io/kairos/core-rockylinux:v2.4.0  to  /tmp/auroraboot/temp-rootfs
  ERROR     lchown /tmp/auroraboot/temp-rootfs/var: permission denied
'
6:18PM ERR Failed pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/auroraboot/temp-rootfs' (local: false): exit status 1
2 errors occurred:
        * exit status 1
        * 'gen-iso' deps container-pull failed

The chown fails regardless of the state_dir arg in my testing.

@mauromorales this seems pretty serious as it prevents ISO creation completely, regardless of the source container.

mauromorales commented 1 year ago

@clanktron this is working properly on Linux, so I suspect that there must be some permission issue between the container on the VM and the local FS

mauromorales commented 1 year ago

In my case I see the following error:

  ERROR     chmod /tmp/auroraboot/temp-rootfs/etc/machine-id: permission denied
'
9:22PM ERR Failed pulling container image 'quay.io/kairos/core-rockylinux:v1.5.0' to '/tmp/auroraboot/temp-rootfs' (local: false): exit status 1
2 errors occurred:
    * exit status 1
    * 'gen-iso' deps container-pull failed
clanktron commented 1 year ago

@clanktron this is working properly on Linux

oops you're right. I had tried it in the aforementioned VMs without the debug flag and didn't realize both of them just didn't have enough available disk space 🤦 .

clanktron commented 1 year ago

This seems related to #1695, though in my testing I can't seem to get it to work with or without the state_dir arg.

Itxaka commented 1 year ago

This worked for me under linux:

docker run -v "$PWD"/config.yaml:/config.yaml -v "$PWD"/build:/tmp \
                                                              --rm -ti quay.io/kairos/auroraboot \
                                                              --set container_image=quay.io/kairos/core-rockylinux:v2.4.0 \
                                                              --set "disable_http_server=true" \
                                                              --set "disable_netboot=true" \
                                                              --cloud-config /config.yaml

@clanktron Does it fail under linux for you too? or only under Mac?

mauromorales commented 1 year ago

IMO we should close this issue to not mix the original issue which was related to artifact download, and open a new issue to address the permissions

clanktron commented 1 year ago

Just macOS is failing, linux was only failing because I was negligent and didn't notice the vm disks I was using had too little free space.

I'm using rancher desktop on an intel mac for context. I've tried both the docker engine and the containerd engine with nerdctl and both give me the chown error.

3:16AM INF Pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/temp-rootfs' (local: false)
3:17AM DBG Output ' INFO    Downloading  quay.io/kairos/core-rockylinux:v2.4.0  to  /tmp/temp-rootfs
  ERROR     lchown /tmp/temp-rootfs/var: permission denied
'
3:17AM ERR Failed pulling container image 'quay.io/kairos/core-rockylinux:v2.4.0' to '/tmp/temp-rootfs' (local: false): exit status 1
2 errors occurred:
        * exit status 1
        * 'gen-iso' deps container-pull failed

There seems to be at least some filesystem access since the following is created under the build dir before the failure.

.
├── build
│   ├── auroraboot-dat2581641770
│   ├── iso
│   │   └── config.yaml
│   ├── netboot
│   └── temp-rootfs
│       └── var
└── config.yaml

6 directories, 3 files