pwncollege / dojo

Infrastructure powering the pwn.college dojo
https://pwn.college
BSD 2-Clause "Simplified" License
301 stars 100 forks source link

Infrastructure: Fix MacOS support #558

Open ConnorNelson opened 2 months ago

ConnorNelson commented 2 months ago

Based on discussions in https://github.com/docker/for-mac/issues/7168#issuecomment-1935284120 and https://github.com/kubernetes/minikube/issues/17700 it is clear that systemd is our enemy.

Default behavior of systemd is, for some reason, to perform destructive operations on /proc/sys/fs/binfmt_misc, and we ultimately lose /proc/sys/fs/binfmt_misc/rosetta, which means no more emulation.

This systemd behavior destructively interferes all the way up to the docker vm host. It will literally brick host MacOS docker rosetta support until docker is restarted:

$ docker run --platform linux/amd64 -it --rm --privileged ubuntu:latest bash
exec /usr/bin/bash: exec format error

It is no surprise this also bricks our inner docker support.

Removing the files inside /lib/binfmt.d has the behavior of disabling the binfmt_misc systemd service. This just removes some python3.12 binfmt_misc config, that doesn't seem to have any impact that affects us.

ConnorNelson commented 2 months ago

CC @adamdoupe who may be interested in unrelated binfmt_misc infrastructure changes soon. Just want this to be on your radar, probably we can find another solution that works for both. Probably we can even just mount in binfmt_misc, add our stuff, and systemd (hopefully) won't hurt us when it starts up.

Smit2553 commented 2 months ago

@ConnorNelson Thank you for looking into this, I'll try building the branch and see if I have any errors.

Smit2553 commented 2 months ago

Progress but still fails at the end. Gets past all the installation steps, and no longer crashes at what I mentioned in #555, however, fails while trying to get information regarding the KVM.

Adding the argument --device /dev/kvm to the run command does not fix the issue, and neither does running as admin both of which were suggested in other issues relating to Docker on MacOS.

Full Command:

docker run     --name dojo     --privileged  --device /dev/kvm   -v "${DOJO_PATH}:/opt/pwn.college"     -v "dojo-data-docker:/data/docker"     -v "${DATA_PATH}:/data:shared"     -p 22:22 -p 80:80 -p 443:443     -d     pwncollege/dojo

I am running on an M3 MacBook Air which should support hardware virtualization and nested virtualization.

Logs:


Sep 05 03:48:11 adfddad8b5eb dojo[13253]:  Container pwncollege-challenge-1  Starting
Sep 05 03:48:11 adfddad8b5eb dojo[13253]:  Container pwncollege-challenge-1  Started
Sep 05 03:48:11 adfddad8b5eb dojo[13253]:  Container sshd  Started
Sep 05 03:48:11 adfddad8b5eb dojo[13253]:  Container nginx-proxy-acme  Started
Sep 05 03:48:11 adfddad8b5eb dojo[13253]: Error response from daemon: error gathering device information while adding custom device "/dev/kvm": no such file or directory
Sep 05 03:48:11 adfddad8b5eb systemd[1]: pwn.college.service: Main process exited, code=exited, status=1/FAILURE
Sep 05 03:48:11 adfddad8b5eb systemd[1]: pwn.college.service: Failed with result 'exit-code'.
Sep 05 03:48:11 adfddad8b5eb systemd[1]: Failed to start pwn.college.service - pwn.college docker compose service.```
ConnorNelson commented 2 months ago

I haven't quite had time to investigate why workspace-builder still doesn't correctly finish running with these changes. If you want to investigate you can check out docker logs -f workspace-builder with this commit. For some reason we get IO errors.

Smit2553 commented 2 months ago

I am a little out of my depth trying to diagnose this but tried looking into it, found out that it most likely has something to do with auto-optimise-store = true in the nix conf file.

Taking it out of the file gets rid of the IO errors but that leads to error: creating file '/out/nix/store/lndzdn1x3n06fpjlbabgxqfaml2n6gj2-linux-headers-6.7/include/linux/netfilter/xt_connmark.h': File exists.

The underlying problem that caused the IO errors are likely still there but taking out the optimize store line causes it to either crash before it gets to where the IO errors occurred or it just doesn't try to create the links that caused the IO errors

Most issues in the nix repo about similar errors have either been closed or fixes have been merged so not sure what's going on here.

Theoretically, setting filter-syscalls to false should have fixed what was happening but apparently not.