oxen-io / oxen-docker

central place for docker related things
8 stars 7 forks source link

running lokinet with user namespace isolation enabled. #13

Open beardstack opened 2 years ago

beardstack commented 2 years ago

Hey there. I am experimenting with this project and I was hoping to run lokinet on my restricted docker system some of my homelab services and limit some access through lokinet only and also run a few private exit nodes. I'm running into some funky errors despite setting up my compose to use host ns, I think i may need more cap_add entries? I'm not quite sure what the container needs though.

version: '2'
services:
  lokinet:
    image: registry.oxen.rocks/lokinet-nginx:latest
    privileged: true
    tty: true
    userns_mode: 'host'
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    tmpfs:
      - /run
      - /tmp
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
      - /sys/fs/cgroup/systemd
      - data:/data

volumes:
  data:

ERROR: for lokinet_lokinet_1  Cannot start service lokinet: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/docker/165536.165536/volumes/c9fea03f5685abc039b0a22512cfa34fc6e772a8618e6697d60aa50a711b862a/_data" to rootfs at "/sys/fs/cgroup/systemd": mkdir /var/lib/docker/165536.165536/overlay2/a7089861ecc7c7f6c7046e21040a18762c4d86be28321cb07e73e99c538caa71/merged/sys/fs/cgroup/systemd: read-only file system: unknown

ERROR: for lokinet  Cannot start service lokinet: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/docker/165536.165536/volumes/c9fea03f5685abc039b0a22512cfa34fc6e772a8618e6697d60aa50a711b862a/_data" to rootfs at "/sys/fs/cgroup/systemd": mkdir /var/lib/docker/165536.165536/overlay2/a7089861ecc7c7f6c7046e21040a18762c4d86be28321cb07e73e99c538caa71/merged/sys/fs/cgroup/systemd: read-only file system: unknown

I've also tried tweaking the example compose file to match system volumes with ro/rw; the container starts and fails

    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup
      - /sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd
lokinet_1  | Failed to create /init.scope control group: Read-only file system
lokinet_1  | Failed to allocate manager object: Read-only file system
lokinet_1  | [!!!!!!] Failed to allocate manager object.
lokinet_1  | Exiting PID 1...
lokinet_lokinet_1 exited with code 255
lokinet_1  | Failed to create /init.scope control group: No such file or directory
lokinet_1  | Failed to allocate manager object: No such file or directory
lokinet_1  | [!!!!!!] Failed to allocate manager object.
lokinet_1  | Exiting PID 1...
lokinet_lokinet_1 exited with code 255
majestrate commented 2 years ago

yeah i dont have enough skill in my docker foo to know that it needs. primarily i went with running debian with systemd running lokinet jammed inside a docker image. any input on making this would is greatly appreciated.

beardstack commented 2 years ago

I'll try to figure it out and wouldn't mind helping on the docker/scripting side of things. I'm not familiar with the lokinet architecture and I'm trying to figure out where Failed to create /init.scope control group: No such file or directory is coming from.

beardstack commented 2 years ago

Could you please explain why/how you use systemd inside the container? Looking into things deeper the problem seems to be because of a cgroup2/docker incompatibility and it seems a fair bit more complex than what I expected. Is there a way to execute things without systemd?

majestrate commented 2 years ago

Could you please explain why/how you use systemd inside the container?

systemd provides a bunch of nice things and is very very good at managing processes as an init.

Is there a way to execute things without systemd?

i am sure there is, i just never spent time to set it up as docker images are an experiment that used to work at one time but is unmaintained.

beardstack commented 2 years ago

Could you please explain why/how you use systemd inside the container?

systemd provides a bunch of nice things and is very very good at managing processes as an init.

Is there a way to execute things without systemd?

i am sure there is, i just never spent time to set it up as docker images are an experiment that used to work at one time but is unmaintained.

Could you please link me? I am not sure if you "need" systemd but there are lighter init systems out there. It really depends what you are using systemd for, systemd has some major advantages if you want to do advanced things (watching sockets or weird things like that). I'm having a tough time sifting through the lokinet code (not particularly good at c++ and some direction would be nice. I not finding the section of code that writes unit files and what they do etc.

majestrate commented 2 years ago

Could you please explain why/how you use systemd inside the container?

systemd provides a bunch of nice things and is very very good at managing processes as an init.

Is there a way to execute things without systemd?

i am sure there is, i just never spent time to set it up as docker images are an experiment that used to work at one time but is unmaintained.

Could you please link me? I am not sure if you "need" systemd but there are lighter init systems out there. It really depends what you are using systemd for, systemd has some major advantages if you want to do advanced things (watching sockets or weird things like that). I'm having a tough time sifting through the lokinet code (not particularly good at c++ and some direction would be nice. I not finding the section of code that writes unit files and what they do etc.

oh no we absolutely dont NEED systemd, i just thought doing that would have been easier as we already have stuff for it and such was very robust in our prod setups. it wasnt easier, so i gave up and left it broken.

beardstack commented 2 years ago

So I made some progress and was able to start the container but I still have a ton of systemd related errors; Can you confirm which of those are needed or that I can safely disable/remove/ignore?

lokinet_1  | dev-hugepages.mount: Mount process exited, code=exited, status=32/n/a
lokinet_1  | dev-hugepages.mount: Failed with result 'exit-code'.
lokinet_1  | [FAILED] Failed to mount Huge Pages File System.
lokinet_1  | See 'systemctl status dev-hugepages.mount' for details.
lokinet_1  | sys-kernel-debug.mount: Mount process exited, code=exited, status=32/n/a
lokinet_1  | sys-kernel-debug.mount: Failed with result 'exit-code'.
lokinet_1  | [FAILED] Failed to mount Kernel Debug File System.
lokinet_1  | See 'systemctl status sys-kernel-debug.mount' for details.
lokinet_1  | sys-kernel-tracing.mount: Mount process exited, code=exited, status=32/n/a
lokinet_1  | sys-kernel-tracing.mount: Failed with result 'exit-code'.
lokinet_1  | [FAILED] Failed to mount Kernel Trace File System.
lokinet_1  | See 'systemctl status sys-kernel-tracing.mount' for details.
lokinet_1  | [  OK  ] Finished Remount Root and Kernel File Systems.
lokinet_1  | sys-fs-fuse-connections.mount: Mount process exited, code=exited, status=32/n/a
lokinet_1  | sys-fs-fuse-connections.mount: Failed with result 'exit-code'.
lokinet_1  | [FAILED] Failed to mount FUSE Control File System.
lokinet_1  | See 'systemctl status sys-fs-fuse-connections.mount' for details.
lokinet_1  | sys-kernel-config.mount: Mount process exited, code=exited, status=32/n/a
lokinet_1  | sys-kernel-config.mount: Failed with result 'exit-code'.
lokinet_1  | [FAILED] Failed to mount Kernel Configuration File System.
lokinet_1  | See 'systemctl status sys-kernel-config.mount' for details.

the lokinet service also failed ...

lokinet_1  | [FAILED] Failed to start LokiNET: A…twork layer thingydoo, client.
lokinet_1  | See 'systemctl status lokinet.service' for details.

However running the executable directly /usr/bin/lokinet /var/lib/lokinet/lokinet.ini I see some more understandable errors namely:

[ERR] [](663) 2022-11-05 08:26:30.404 GMT [+0.083s] ../llarp/vpn/linux.hpp:86   we are not allowed to use IPv6 on this system: we are not allowed to call this ioctl
[WRN] [](663) 2022-11-05 08:26:30.409 GMT [+0.088s] ../llarp/router/systemd_resolved.cpp:81 Failed to connect to system bus to set DNS: No such file or directory
majestrate commented 2 years ago

So I need to ask, do I need to enable IPv6 to use lokinet? I have it disabled by default on servers.

you dont NEED to, you just need to make sure lokinet is configured to not die hard when you have no ipv6 supported in the kernel, i forget the default behavior but i know we have a config option that does this.

As far as resolvd is concerned, is there a particular reason it's needed?

in your kind of setup? not really. systemd-resolved made our lives easier when setting dns on a systemd based linux.

Can it be skipped since we can either use the dns directive in docker or just have the name server statically set in resolv.conf?

oh yeah, totally. this warning can be safely ignored if you dont use resolved to set dns.

Is this altered at runtime or when the lokinet routes are updated?

both. in your use case static dns is fine and this behavior is not relevant. only send .loki and .snode queries to it as that is what is toggled by exit routes being added, it will tunnel all dns in exit mode with resolved. if you are using dnsmasq for dns i have some config snippets for that.

majestrate commented 2 years ago

the ipv6 config snippet you need is:

[network]
ip6-range=

yes, a blank

majestrate commented 2 years ago

So I made some progress and was able to start the container but I still have a ton of systemd related errors; Can you confirm which of those are needed or that I can safely disable/remove/ignore?

you could probably just run lokinet with some super simple thing like runit if that is the literal only thing in that container.

majestrate commented 2 years ago

or... just flat out run lokinet in foreground with a docker RUN directive. the issue is that will need privs and i dont like running things as root when they CAN run without root. especially inside docker. docker's security model is... bizare. the docker group is basically one giant root privesc. never understood how their security is supposed to work.

beardstack commented 2 years ago

The more I look at it, the more I want to try to compile it on an an alpine image and strip down the whole systemd dependencies and lock things down.

docker's security model is... bizare

Yea it is until you wrap your head around it. My daemon config below uses the userns feature - which is basically "rootless" docker. There are a bunch of advantages mostly when it comes to security and priv escalation. For non privileged containers, that means that inside the container you are root but in reality you're just being remapped and breaking out of the container would be almost impossible.

cat /etc/docker/daemon.json

  "userns-remap": "default",
  "features": {
    "buildkit" : true
  },
  "cgroup-parent": "docker.slice",
  "exec-opts": ["native.cgroupdriver=systemd"],

In our context, it will require special permissions which may mean using privileged mode and a host namespace. Because of that, systemd makes the attack surface much larger /w root access on the host which pretty much defeats my entire purpose of locking down the system.

Because of the errors that I am seeing, it makes more sense to simply get rid of what's not absolutely needed, systemd is trying to "mount" the following:

  1. Huge Pages File System
  2. Kernel Debug File System
  3. Kernel Trace File System
  4. sys-fs-fuse

Trying to tweak and reverse the whole thing is quite a pain and I've been trying multiple configuration options, mounting volumes, devices, changing c_group parameters and messing about with cap_add - and most of it due to systemd wanting to do things we don't really need to do.

version: '2'
services:
  lokinet:
    image: registry.oxen.rocks/lokinet-nginx:latest
    privileged: true
    tty: true
    userns_mode: 'host'
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
      - SYS_ADMIN
      - ALL
      - SYS_PTRACE
     security_opt:
      - seccomp:unconfined
    cgroup_parent: docker.slice
    dns:
      - "127.3.2.1"
    volumes:
      - data:/data
    devices:
      - /dev/hugepages
      - /dev/fuse

As you can see my compose file is pretty messy and I haven't cleared the errors quite yet. I believe the better approach would be to build from source on a smaller image (alpine) and work entirely without systemd. This is probably going to be a fairly demanding task since I've noticed a few things that could certainly get optimized/automated/and tightened up e.g (use of python and perl in scripts, running under a different user, dropping capabilities, using wrapper scripts/environment variables for configuration among other things)

@majestrate I can probably do it, but it would definitely help a lot if I could see your previous abandoned code and maybe get some assistance with building from source if you are willing to assist me with that.