mviereck / x11docker

Run GUI applications and desktops in docker and podman containers. Focus on security.
MIT License
5.59k stars 375 forks source link

waiting very long for "x11docker=ready" | Xorg in rootless podman | bubblewrap setups #466

Open jonleivent opened 2 years ago

jonleivent commented 2 years ago

In Fedora CoreOS (in a virtualbox VM), with latest images of docker.io/x11docker/xserver and docker.io/x11docker/fluxbox, using the 7.4.2 version of x11docker script: x11docker -D -V --backend=podman --desktop x11docker/fluxbox loops apparently forever with:

...
DEBUGNOTE[16:27:10,027]: waitforlogentry(): tailstdout: Waiting since 703s for log entry "x11docker=ready" in store.info
DEBUGNOTE[16:27:10,027]: waitforlogentry(): tailstderr: Waiting since 703s for log entry "x11docker=ready" in store.info
...

I'm assuming docker.io/x11docker/{xserver,fluxbox} are your images. Am I wrong? Also, doesn't the x11docker script test such things?

jonleivent commented 2 years ago

Also tried x11docker/openbox and x11docker/kde-plasma, and I get the same looping behavior.

jonleivent commented 2 years ago

The above turned out to be a python issue (I was trying to use a containerized python, and it wasn't working right). After I installed python3 (uncontainerized), I get a different problem: (EE) xf86OpenConsole: Cannot open virtual console 8 (Permission denied). So I sudo chowned /dev/tty8, but that resulted in (EE) xf86OpenConsole: Switching VT failed .

mviereck commented 2 years ago

Thank you for the report!

The above turned out to be a python issue (I was trying to use a containerized python, and it wasn't working right).

Was this a custom setup of yours or something more common that x11docker should check and be able to handle?

I get a different problem: (EE) xf86OpenConsole: Cannot open virtual console 8 (Permission denied). So I sudo chowned /dev/tty8, but that resulted in (EE) xf86OpenConsole: Switching VT failed .

Likely x11docker wants to run Xorg on a different tty, but your system is not configured to allow this. You can either run with sudo x11docker [...] or configure Xorg to allow the start. Compare https://github.com/mviereck/x11docker/wiki/Setup-for-option---xorg

jonleivent commented 2 years ago

The setup is very pure: Start with Fedora CoreOS (which comes with podman), install absolutely nothing else on it (although I needed python for your script - more on that later...), and use the x11docker script with xserver container and one or more window manager or desktop environment containers to get a choice of desktop enviornments running on it. I can use either VirtualBox or Qemu/KVM to house the CoreOS install for experimentation, but eventually my goal is for a bare metal install of CoreOS (with no additional installs on it) with a fully podman containerized single user no remote access desktop environment.

I will try sudo x11docker and report back, but I want to run x11docker completely rootless. If I wanted to configure Xorg, but am using the x11docker/xserver container, would I need to rebuild the xserver container to do so, or is there a path into its configuration from some x11docker script arg? Note: I may want to build my own xserver container anyway as I don't need nxagent or xpra (or xfishtank!), also probably not the hacked MIT-SHM (because nothing will be remote), but would benefit from virtualbox-guest-x11. Do you have advice on doing so? I see that the x11docker script is checking for config and labeling of the xserver container.

BTW: about x11docker script requirement for python. It seems the requirement is very light. Possibly the script would work with just using podman inspect --format, or by using jq (which is available probably wherever podman or docker are, and comes installed on CoreOS). Of course, my case is extreme, as CoreOS does not have any version of python installed, and I don't want to install one (although I did so due to x11docker's requirement).

mviereck commented 2 years ago

I will try sudo x11docker and report back, but I want to run x11docker completely rootless. If I wanted to configure Xorg, but am using the x11docker/xserver container, would I need to rebuild the xserver container to do so, or is there a path into its configuration from some x11docker script arg?

Oh, right, you are already using x11docker/xserver. The configuration of Xwrapper.config or running as root for --xorg should only be needed if using Xorg from host. Here on Debian I don't have a configured Xwrapper.config on host but in the image x11docker/xserver only. The lines in the Dockerfile are:

# configure Xorg wrapper
RUN echo 'allowed_users=anybody' >/etc/X11/Xwrapper.config && \
    echo 'needs_root_rights=yes' >>/etc/X11/Xwrapper.config

TTY switching works fine without root. IIRC this succeeded in a fedora desktop VM, too. I might set up a fedora CoreOS VM to reproduce your issue.

Note: I may want to build my own xserver container anyway as I don't need nxagent or xpra (or xfishtank!), also probably not the hacked MIT-SHM (because nothing will be remote), but would benefit from virtualbox-guest-x11. Do you have advice on doing so? I see that the x11docker script is checking for config and labeling of the xserver container.

You can reduce the Dockerfile of x11docker/xserver to your needs. Below a proposal for a Dockerfile reduced to Xorg. I've removed some of the tools, too (including the cute xfishtank); the LABEL list of available tools might be wrong now, would need a closer check:

FROM debian:bullseye

# cleanup script for use after apt-get
RUN echo '#! /bin/sh\n\
env DEBIAN_FRONTEND=noninteractive apt-get autoremove --purge -y\n\
apt-get clean\n\
find /var/lib/apt/lists -type f -delete\n\
find /var/cache -type f -delete\n\
find /var/log -type f -delete\n\
exit 0\n\
' > /apt_cleanup && chmod +x /apt_cleanup

# X servers
RUN apt-get update && \
    env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        xserver-xorg \
        xserver-xorg-legacy && \
    /apt_cleanup

# Window manager openbox with disabled context menu
RUN apt-get update && \
    env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        openbox && \
    sed -i /ShowMenu/d         /etc/xdg/openbox/rc.xml && \
    sed -i s/NLIMC/NLMC/       /etc/xdg/openbox/rc.xml && \
    /apt_cleanup

# tools
RUN apt-get update && \
    env DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        catatonit \
        procps \
        psmisc \
        psutils \
        x11-utils \
        x11-xkb-utils \
        x11-xserver-utils \
        xauth \
        xinit && \
    /apt_cleanup

# configure Xorg wrapper
RUN echo 'allowed_users=anybody' >/etc/X11/Xwrapper.config && \
    echo 'needs_root_rights=yes' >>/etc/X11/Xwrapper.config

# HOME
RUN mkdir -p /home/container && chmod 777 /home/container
ENV HOME=/home/container

LABEL options='--xorg'
LABEL tools='catatonit cvt glxinfo setxkbmap \
             xauth xdpyinfo xdriinfo xev \
             xhost xinit xkbcomp xkill xlsclients xmessage \
             xmodmap xprop xrandr xrefresh xset xsetroot xvinfo xwininfo'
LABEL options_console='--xorg'
LABEL windowmanager='openbox'

ENTRYPOINT ["/usr/bin/catatonit", "--"]

BTW: about x11docker script requirement for python. It seems the requirement is very light. Possibly the script would work with just using podman inspect --format, or by using jq (which is available probably wherever podman or docker are, and comes installed on CoreOS). Of course, my case is extreme, as CoreOS does not have any version of python installed, and I don't want to install one (although I did so due to x11docker's requirement).

I am not entirely happy about the python dependency. x11docker also supports nerdctl that does not support nerdctl inspect --format yet. I've also tried jq but found that it is not installed by default everywhere, but it seemed that python is a widespread standard that can be expected. I consider to check for jq if python is not installed and use it in that case.

jonleivent commented 2 years ago

Running sudo x11docker... does not work as it won't find the user's pulled images when running as root. In other words sudo podman images returns nothing even though the user runing podman images has x11docker/xserver, x11docker/fluxbox among others.

mviereck commented 2 years ago

Running sudo x11docker... does not work as it won't find the user's pulled images when running as root. In other words sudo podman images returns nothing even though the user runing podman images has x11docker/xserver, x11docker/fluxbox among others.

Ok, right; I confused a bit here, sorry.

Now I remember: Running Xorg within a container works only with rootful podman. So you would need at least x11docker/xserver in rootful podman. To run containers in rootless podman, but with root for the Xorg container:

sudo x11docker --backend=podman --rootless --xorg [...]

To avoid the need of sudo, you would have to install Xorg on host. I failed to get Xorg running in rootless podman and I am not sure if it is possible at all.

jonleivent commented 2 years ago

Isn't the --xorg going to bypass my --xc=podman requirement to use the container version of xserver? It seems to do that when I try it, as it gives me the x11docker ERROR: Did not find a possibility to provide a display. error message. Dropping --xorg and keeping --xc=podman, I get back to my earlier (EE) xf86OpenConsole: Cannot open virtual console 8 (Permission denied) error in the log, even though now using sudo. Could this be a selinux issue about consoles?

Note that a wayland-based x11docker desktop such as x11docker/kde-plasma does no better. I have seen that Fedora Silverblue and Kinoite (the KDE variety of the same family) run wayland as the user, not as root. So that suggests the possibility of a container doing so on CoreOS. But, using x11docker/kde-plasma as the target desktop container did not change matters - I again get the same Cannot open virtual console 8... error in the log file.

jonleivent commented 2 years ago

BTW: if you want to get CoreOS up and running in VirtualBox quickly, I can help with that. It's not your typical install-from-ISO-as-CD distro.

mviereck commented 2 years ago

Isn't the --xorg going to bypass my --xc=podman requirement to use the container version of xserver?

You don't need to specify option --xc if image x11docker/xserver is available. x11docker uses it automatically (and tells so in a message). Regardless if the image is available or --xc is specified, you can choose a desired X server, here with --xorg.

It seems to do that when I try it, as it gives me the x11docker ERROR: Did not find a possibility to provide a display. error message. Dropping --xorg and keeping --xc=podman, I get back to my earlier (EE) xf86OpenConsole: Cannot open virtual console 8 (Permission denied) error in the log, even though now using sudo.

I am not sure yet if I understand right. Do you have x11docker/xserver now if you check sudo podman images? If not, and if Xorg is not installed on host, x11docker will find no Xorg that it could start with sudo x11docker [...]. The special example sudo x11docker --backend=podman --rootless --xorg [...] runs a rootful podman for x11docker/xserver, but a rootless podman for the desired container.

Or maybe I misunderstand you? Please show me your commands and the resulting error messages.

Note that a wayland-based x11docker desktop such as x11docker/kde-plasma does no better. I have seen that Fedora Silverblue and Kinoite (the KDE variety of the same family) run wayland as the user, not as root. So that suggests the possibility of a container doing so on CoreOS. But, using x11docker/kde-plasma as the target desktop container did not change matters - I again get the same Cannot open virtual console 8... error in the log file.

It doesn't matter which image you use because x11docker sets up Wayland or X before the container is started. Running X or Wayland from console always needs some sort of privileges. Aside from obvious sudo there are possibilities with suid X or some obscure logind privilege setups. So even if your process list shows that X or Wayland are running as an unprivileged user, something in background gave some privileges to the process. I admit that I don't understand all ways how privileges are granted.

mviereck commented 2 years ago

BTW: if you want to get CoreOS up and running in VirtualBox quickly, I can help with that. It's not your typical install-from-ISO-as-CD distro.

Thank you for offer! I'll try to reproduce in my regular fedora VM first. But very likely it is just the issue that Xorg in container does not run with rootless podman. x11docker should catch that case and print a message instead of producing an error.

jonleivent commented 2 years ago

Running podman images as user, I have: x11docker/xserver, x11docker/fluxbox, x11docker/openbox, and x11docker/kde-plasma. But, sudo podman images sees no images, as they're all in the user's container storage, not root's.

Running in the x11docker git clone directory:

sudo ./x11docker --backend=podman --rootless --xorg --desktop x11docker/fluxbox

This produces

x11docker ERROR: Did not find a possibility to provide a display.
...

error message. I can't copy-n-paste it or transfer it out of the VirtualBox VM easily with CoreOS, because no VirtualBox guest extensions are present. If you need the whole output and/or log file, I will work on a transfer ability.

If I instead run

sudo ./x11docker --backend=podman --rootless --xc=podman --desktop x11/fluxbox

that's when I get

(EE) xf86OpenConsole: Cannot open virtual console 8 (Permission denied)

in the ~/.cache/x11docker/x11docker.log file

If I need other privileges to run rootless, I can worry about getting them later.

mviereck commented 2 years ago

I can't copy-n-paste it or transfer it out of the VirtualBox VM easily with CoreOS, because no VirtualBox guest extensions are present. If you need the whole output and/or log file, I will work on a transfer ability.

Thank you! I don't need the full text, I just needed to sort the commands and their error messages.

The first error is correct:

Running in the x11docker git clone directory:

sudo ./x11docker --backend=podman --rootless --xorg --desktop x11docker/fluxbox

This produces

x11docker ERROR: Did not find a possibility to provide a display.

Please provide x11docker/xserver in rootful podman. Than this should work. So run sudo podman pull x11docker/xserver or build the reduced Dockerfile example above with sudo podman build -t x11docker/xserver [...].

Your second command sudo ./x11docker --backend=podman --rootless --xc=podman --desktop x11/fluxbox should have produced the same error, but it did not. This is likely an x11docker bug. It seems that it does not try to use rootful podman for the Xorg container although it should. I'll check this.

jonleivent commented 2 years ago

I did a podman pull x11docker/xserver as root, and now things are much closer to working. I am getting an X server with a gray background and mouse tracking, but no fluxbox desktop in it (which would have a root menu and a toolbar, by default). Also, no way to exit, except by ACPI shutdown. This with the sudo ./x11docker --backend=podman --rootless --xorg --desktop x11docker/fluxbox command. Same thing for x11docker/kde-plasma.

jonleivent commented 2 years ago

I'll bet it isn't finding x11docker/fluxbox or any other desktop the user has pulled. Maybe I should run two x11docker instances, one as root for the server the other as user for the desktop? Is there a way to do that?

jonleivent commented 2 years ago

Surprisingly, the fluxbox desktop appeared after a very long delay. I didn't need to start a separate x11docker instance as I thought, just waiting. Hopefully this long delay is a one-time issue.

mviereck commented 2 years ago

I've tried to reproduce the issue but now have the unrelated problem that my root partition has not enough space left for image x11docker/xserver to pull it in rootfull podman.

If you somehow can provide me the log file ~/.cache/x11docker/x11docker.log , I might find a hint what is blocking the process.

Also, no way to exit, except by ACPI shutdown.

At least CTRL+ALT+F(n) should be possible to switch to another console.

Maybe I should run two x11docker instances, one as root for the server the other as user for the desktop? Is there a way to do that?

That is basically possible, but should not be needed.

Surprisingly, the fluxbox desktop appeared after a very long delay. I didn't need to start a separate x11docker instance as I thought, just waiting. Hopefully this long delay is a one-time issue.

You could specify another tty with option --vt , e.g.--vt=8, and switch back to your current tty (check it with tty) with CTRL+ALT+F(n). Than you can read the x11docker terminal output. if you add --debug, this might give some useful hints.


It is late here, and I am tired. I'll look at this tomorrow again.

jonleivent commented 2 years ago

I have both fluxbox and openbox working, but not kde-plasma. It looks like the x11docker/fluxbox and x11docker/openbox containers have no X client apps, such as xterm, in them. Of course they are designed to be used as base layers for other containers, so I will do that.

I noticed that on distros where I can start X as non-root (via the startx -> xinit route), they have Xorg.wrap, a setuid for doing just that. But a more secure scheme with just the necessary capabilities is probably possible: some setcap'ed way of launching the x11docker script as the user instead of root.

mviereck commented 2 years ago

I have both fluxbox and openbox working

Do you still have the long startup delay? It can be as well a podman issue that I once had, too. It can be solved temporary with podman system prune to clean up the podman storage.

but not kde-plasma

kde-plasma needs --init=systemd. Did you set this option?

Also add option --desktop for desktop environments, otherwise x11docker will run a window manager that might cause issues.

But a more secure scheme with just the necessary capabilities is probably possible: some setcap'ed way of launching the x11docker script as the user instead of root.

The ideal way would be to be able to run Xorg with rootless podman. We might give it a try again and ask the podman developers for help. x11docker already gives the bare minimum of capabilities to the container of x11docker/xserver that is needed to run Xorg. One guess: Xorg in rootless podman might fail because podman's root in container is not root on host so Xorg.wrap fails.

jonleivent commented 2 years ago

I tried podman system prune, and it did prune some things. I also pulled the x11docker/mate image to try a mid-sized X11 desktop instead of the large wayland kde-plasma desktop to see if I can get anything else beyond fluxbox and openbox. However, I'm getting mysterious errors with mate that suggests a podman bug - go backtraces, but only after a very long delay even after having done podman system prune. From the looks of things, it is something running within the desktop container or launched from podman for the desktop container called 'exe' running as the user. The behavior is strange: it isn't doing much but very slowly reading and writing disk - despite the disk being a fast SSD, the rates are 500K read/sec and 2M write/sec. With very low cpu usage. So, WTF is that?

I will keep --init=systemd in mind with kde-plasma, but all indications are that I never got close to that problem due to podman issues prior to that. I always have the --desktop option on.

I don't think Xorg.wrap would fail to run just because it is in a rootless container. But it would fail to deliver the necessary capabilities, which the rootless container didn't inherit from the user. I think that the necessary capabilities have to be delivered from the outside-in starting with the x11docker script itself. However, I know only enough about capabilities to know I don't know enough about capabilities :( But I do think this can be fixed externally to podman, assuming podman doesn't intentionally drop excess capabilities it has inherited when run rootless.

jonleivent commented 2 years ago

And mate is finally running! After over an hour of that exe process doing whatever.

mviereck commented 2 years ago

Do you have a way to send me ~/.cache/x11docker/x11docker.log? A useful logfile would be after you have terminated a container that had such a long startup delay. I would like to find out what causes the delay. I've build the Dockerfile above reduced to Xorg in x11docker/xserver in rootful podman. Here a startup of x11docker/fvwm is pretty fast.

From the looks of things, it is something running within the desktop container or launched from podman for the desktop container called 'exe' running as the user. The behavior is strange: it isn't doing much but very slowly reading and writing disk - despite the disk being a fast SSD, the rates are 500K read/sec and 2M write/sec. With very low cpu usage. So, WTF is that?

x11docker doesn't run anything called exe. However, at some points it waits for an event that one task of x11docker writes something to a logfile so another task of x11docker can continue.

jonleivent commented 2 years ago

Once they finally start up the first time, each desktop has pretty fast (~10sec) startup times after that. So I will have to try a new one. I will try fvwm and send you the log if it is slow. If that isn't slow, I will try XFCE or LXDE.

I figured out a way to get files out of a vbox without guest additions by using a USB drive. I would rather have a way to mount something rw in the vbox that is ro outside, and there's probably a way to do that, but the USB drive trick works without further thought, so it wins.

I think that exe app must be part of podman itself. It isn't just waiting and writing small things to a log file. It's writing at a steady 2M/sec rate for over an hour, yet not growing the vbox disk image substantially over that time, hence not appending most of that to a log. I've seen what your logging does when it waits for that event - probably only writing at a few bytes/sec rate, and that would be pure appending.

mviereck commented 2 years ago

Once they finally start up the first time, each desktop has pretty fast (~10sec) startup times after that. So I will have to try a new one. I will try fvwm and send you the log if it is slow. If that isn't slow, I will try XFCE or LXDE.

You mean, once you have waited a long time for the first startup, later startups with x11docker are fast? That sounds pretty much like a podman issue, not an x11docker issue. However, to be sure it makes sense to check the log file. It might be worth to compare with rootful podman. For example, run:

sudo podman pull x11docker/fvwm
sudo x11docker --backend=podman --desktop x11docker/fvwm
jonleivent commented 2 years ago

fvwm was fast as user. Nothing interesting in the log, but I've saved it in case it may prove useful by comparison to something slow.

You mean, once you have waited a long time for the first startup, later startups with x11docker are fast?

Yes. On to XFCE as user. If that's slow, I'll try it as root. Actually, I will pull it as root, then podman image scp it to the user, so I don't waste network bandwidth.

Also, if it is slow, I will try pstree to determine the provenance of that exe process.

mviereck commented 2 years ago

Just an idea: Maybe podman is pulling the image again instead of using the one you have pulled before? x11docker sets option --pull=never so this should not happen.

jonleivent commented 2 years ago

Maybe podman is pulling the image again instead of using the one you have pulled before?

There's no network activity, either. I just had to forcibly stop the vm that was trying to start XFCE - I was logged in on another console as root spying on it with pstree and other things, and something I did caused the vm to go crazy. But, I did see the exe process has as its only arg an image overlay file name, so it must be part of podman.

I'll take a look at the log file...

jonleivent commented 2 years ago

No log file. So I restarted XFCE, and will be a bit more careful while spying on it.

doing ls -l /proc/3744/exe where 3744 is the pid of that exe process shows that it is /usr/bin/podman itself, run with a different name (probably doing arg[0] conditionalization).

jonleivent commented 2 years ago

The XFCE desktop finished loading, and appears to work. Attached is the x11docker.log file xfce-x11docker.log

jonleivent commented 2 years ago

Found this: https://github.com/containers/podman/issues/7866 which suggests my goal of a fully rootless containerized X server is a dead end, at least with kernel behavior as of two years ago. That's a bummer, if true. It surprises me that there is no way to pass a small set of capabilities into a rootless namespace container if the invoking user had those capabilities on the way in.

mviereck commented 2 years ago

Found this: https://github.com/containers/podman/issues/7866

That's really a bummer. Good that you found this. I did several attempts meanwhile to get Xorg running, but to no avail. Now I know that I should give up.

I found a ticket related to the very slow startup of your podman containers, maybe it contains a useful hint: https://github.com/containers/podman/issues/13226

I also found the ticket where I once got the hint to use podman system prune: https://github.com/containers/podman/issues/6288

The XFCE desktop finished loading, and appears to work. Attached is the x11docker.log file

Thank you for the log file! At the first glance everything seems to run correctly, but podman is just slow, so this confirms what we already found out. I'll look closer at which point exactly it slows down.

mviereck commented 2 years ago

I had a closer look at the startup delay and found that it happens when x11docker runs command podman logs $Containername ... that serves to catch container output. x11docker did this after the container is accessible, but before pid 1 of container was up. Now x11docker waits with this command until pid 1 of container is running. Maybe this makes a difference.

jonleivent commented 2 years ago

I found a ticket related to the very slow startup of your podman containers, maybe it contains a useful hint: https://github.com/containers/podman/issues/13226

No, mine has graphDriverNam: overlay. Which makes sense since this is Fedora CoreOS, and they wouldn't give that the wrong fs for podman.

As for podman system prune, there does seem to be quite a bit to prune after runs in the user's storage. In the root's storage, I'm getting some kind of error now.

I thought the slowdown was occurring after X11 initialized and before the desktop started, so the root's containers are not the slow ones, it's the user's that are slow. Is that correct? That's what it looks like on the console when I get one of those very slow runs (still only the first run for each desktop image): the console goes gray and I see an "X" mouse cursor tracking the mouse very quickly, and the hour+ delay is between that and the desktop environment showing up.

I might have caused that root error running podman system prune when my VM went crazy that time, and I had to hard shutdown the VM. I can rm the whole thing and reload x11docker/xserver, and see what happens.

Back to rootless container for X11. I have 2 goals for containerizing X11 (and eventually wayland): to use a stable distinct distro (Debian stable or Alpine, for example) version of X11 so that it can be updated at will separately from the base OS, and also get very good rootless sandbox security for it. I can still do the first part, but not the second. So what is the "best" sandboxing that can be done? It may not be running podman as root. X11 running in a rootfull container with something setuid to start that container might be less secure than X11 running as user started by Xorg.wrap.

Now x11docker waits with this command until pid 1 of container is running. Maybe this makes a difference

I'll pull your changes and try it.

BTW: thanks for all of this help!

jonleivent commented 2 years ago

I ran your newest x11docker script, I podman rmi'ed the x11docker/xfce image and repulled it from the repo. I also cleaned out the root's image store and reinstalled the xserver image there. It ran with about a 15min delay between X starting (gray console + "X" mouse cursor tracking the mouse) and XFCE starting. The log is attached. xfce2-x11docker.log

mviereck commented 2 years ago

I thought the slowdown was occurring after X11 initialized and before the desktop started, so the root's containers are not the slow ones, it's the user's that are slow. Is that correct?

Yes, that is correct. Xorg runs in rootful podman, and if the grey screen with the mouse cursor appears reasonable fast, than this part is ok. The slowdown happens in rootless podman after podman run [...]. The second log indicates that delay is during podman inspect $Containername. I have changed x11docker debug output a bit to make this more obvious, but I am already pretty sure. I would conclude that podman just needs veeery long until the container is accessible for a podman inspect command. But podman inspect does not fail but waits until the container is ready. It is interesting that you never seem to have the issue with rootful podman.

This should be reported to podman. However, they would need something easier to reproduce than the full x11docker setup. Does the delay also happen if you run e.g. podman run x11docker/lxde lxterminal? Normally you should get an error message of lxterminal pretty soon ("Display not found"). What happens if you run podman inspect CONTAINERID in parallel?

Back to rootless container for X11. I have 2 goals for containerizing X11 (and eventually wayland): to use a stable distinct distro (Debian stable or Alpine, for example) version of X11 so that it can be updated at will separately from the base OS, and also get very good rootless sandbox security for it. I can still do the first part, but not the second. So what is the "best" sandboxing that can be done? It may not be running podman as root. X11 running in a rootfull container with something setuid to start that container might be less secure than X11 running as user started by Xorg.wrap.

Basically two possibilities are left:

Some custom setuid wrapper to run a rootful Xorg container is possible, but does not sound great. Though, we could separate the containers and somehow make a suid script from the first line. Example:

read Xenv < <(sudo x11docker --xorg --backend=podman --printenv --xonly --desktop)
env $Xenv x11docker --backend=podman --hostdisplay x11docker/xfce

(I had to make a fix to make this work, please update before you try it.)

jonleivent commented 2 years ago

I again pulled your latest x11docker changes. I pulled the lxde image and tried podman run x11docker/lxde lxterminal, and it printed the error almost instantaneously. Not enough time to try a podman inspect in parallel. But, then when I use that lxde image to start the full x11docker experience, it again stalls in between X server and desktop, with that exe process doing its thing. There was also a go backtrace and core dump (but ulimit -c is 0 by default, so nothing got saved). But the x11docker/lxde did eventually complete. Log file attached. lxde1_x11docker.log

I am not convinced that the podman developers would find the full x11docker setup too cumbersome to deal with. They should be able to get considerable detail from debugging exe once they reproduce the case. And it should be reproducible, as there's just not much involved. Anyway, we can let them decide. I will file an issue with them, but I should first start over with a completely fresh VM to make sure it is reproducible.

More thoughts on a custom sandbox for Xorg: it occurs to me that there are alternative ways to rootless containers. One is relying on the kernal.unprivileged_userns_clone=1 flag to enable setup of namespaces by non-root users, which is what podman and bubblewrap (flatpak) do. The other is a setuid exe for setting up the namespace, like what firejail does. I wonder if these setuid sandboxers suffer the same inability to inherit capabilities into a rootless container? Probably not unless they take explicit steps to do so, as the process is root at the point when the namespaces are established. So, maybe they have an advantage (although the bubblewrap people have long stated that firejail being setuid itself is an extreme disadvantage).

Another alternative is to run Xorg as a dedicated "system" unprivileged user with no ability to read the real user's files, no ability to do networking, read mounted file systems, etc. I'm sure selinux rules can be formulated to lock it down even further. This would be a type of sandbox without namespaces, but still very well locked down. It might be possible to make this user unable to read anything in /proc for example.

jonleivent commented 2 years ago

My streamlined attempt to reproduce the delay problem failed. In this attempt, starting with a fresh VM, I pulled only the xfce desktop as user (and of course x11server pulled as root). The sudo ./x11docker ... test took only ~2min for the XFCE desktop to appear.

This could mean that the original VM's user image store was somehow corrupted, or that there is some interference when there are many images in the store, or just bad luck on my part.

So I am not going to bother reporting it until it crops up again.

I'm going to continue to pursue better security for rootless Xorg. If you'd like, I can keep you posted. If I do come up with something, I may start a github project to continue to refine it publicly.

Again, thanks so much for all of the help!

mviereck commented 2 years ago

More thoughts on a custom sandbox for Xorg: it occurs to me that there are alternative ways to rootless containers. One is relying on the kernal.unprivileged_userns_clone=1 flag to enable setup of namespaces by non-root users, which is what podman and bubblewrap (flatpak) do. The other is a setuid exe for setting up the namespace, like what firejail does. I wonder if these setuid sandboxers suffer the same inability to inherit capabilities into a rootless container?

I assume the capability restrictions only occur if the container is based on rootless userns. It seems to me that capability namespacing comes along with userns remapping. (Is there really a thing like capability namespacing?) And rootless containers are only allowed with userns remapping. Note, I am not sure at all, it is just the picture I get yet.

Setuid setups are essentially the same as a rootful container and have no limits. Here we would have to look which namespaces are used. I doubt that e.g. firejail uses all possibilities to isolate its containers. But I don't know and might be wrong.

One quick attempt to get a rootful Xorg podman container without providing the password could be a bash script that runs Xorg with x11docker only and gets an entry in /etc/sudoers with NOPASSWD. As long as this script does not take arguments, the risk is reasonable low. (side note, directly setuid on scripts is not possible and is denied by the kernel.)

Another alternative is to run Xorg as a dedicated "system" unprivileged user with no ability to read the real user's files, no ability to do networking, read mounted file systems, etc. I'm sure selinux rules can be formulated to lock it down even further. This would be a type of sandbox without namespaces, but still very well locked down. It might be possible to make this user unable to read anything in /proc for example.

An interesting thought. Myself i am not a fan of selinux, I feel it as another layer of complexity. And Xorg would still run as root thanks to Xorg.wrap. Some logind tricks to run Xorg rootless do some magic in background that I don't understand either. I prefer simple setups that I understand. AFAIK in BSD it is possible to drop capabilities for a process before it starts and it cannot gain them back. It even goes so far that the entire system can lock itself down to a point where a reboot is needed to regain root privileges.

My streamlined attempt to reproduce the delay problem failed. In this attempt, starting with a fresh VM, I pulled only the xfce desktop as user (and of course x11server pulled as root). The sudo ./x11docker ... test took only ~2min for the XFCE desktop to appear.

~2min still sounds a bit long, although much better than before. Here it takes ~20sec, and my computer is ~12 years old. Some slowdown might be caused by using a VM, but I doubt that this explains everything.

I'm going to continue to pursue better security for rootless Xorg. If you'd like, I can keep you posted. If I do come up with something, I may start a github project to continue to refine it publicly.

I am quite interested! Maybe I'll have some ideas, too, and I am willing to test and check out things.

Again, thanks so much for all of the help!

I quite appreciate our conversation, and thanks for your close looks on what is going on! And thank you for giving x11docker a shot even after having so much trouble to get it work.

jonleivent commented 2 years ago

I am able to get a rootless Xorg server up in an Alpine VM without using Xorg.wrap, and get a working fluxbox desktop! I bypassed Xorg.wrap using a crafted ~/.xserverrc file that directly execs /usr/libexec/Xorg, and tested that it bypasses Xorg.wrap by making /etc/X11/Xwrapper.config not allow anything but root:

$ cat ~/.xserverrc
#!/bin/sh
exec /usr/libexec/Xorg -nolisten tcp "$@"

$ cat /etc/X11/Xwrapper.config
allowed_users=rootonly
need_root_rights=yes

I tested that Xwrapper.config before creating ~/.xsessionrc to make sure it prevented Xorg.wrap from launching a rootless Xorg. Also, running getcap -r / returns nothing: no files have capability extended attributes on this Alpine system.

Next, I installed bubblewrap, and changed ~/.xserverrc to be:

$ cat ~/.xserverrc
#!/bin/sh
exec /usr/bin/bwrap --dev-bind / / --cap-drop ALL --unshare-all -- /usr/libexec/Xorg -nolisten tcp "$@"

And it still works! But much slower (~15sec startup delay) for some reason, and has problems exiting, probably due to no pid1 zombie management(?). But, that's a rootless namespace sandbox without any capabilities, as /usr/bin/bwrap is not setuid on Alpine.

What would happen if you used Alpine instead of Debian for the xserver image, with these changes minus bwrap? How is Alpine getting around the need for cap_sys_tty_config? What would happen if this is tried in a Debian VM?

About selinux: it is already enabled on all Fedora, including CoreOS, and mostly stays out of the way. I was just brainstorming other ways I could lock down Xorg in CoreOS without using a namespace sandbox.

jonleivent commented 2 years ago

I am beginning to think that the only issue with a rootless Xorg container in my CoreOS setup is that the user is not in the video or input groups. And I can't add the user to those groups, either. I think this might be a CoreOS bug, so I'm asking about it in the Fedora discussion group about CoreOS: https://discussion.fedoraproject.org/t/groupadd-usermod-dont-always-work-in-coreos/41735

I have fine-tuned the Alpine bubblewrap sandbox around Xorg some more without much difficulty. It really looks as though if the necessary /devs are bound in the container, and the user is in video and input groups, a rootless Xorg container should just work. Perhaps you might add a test to a rootless Xorg container to check if the user is a member of the video and input groups?

mviereck commented 2 years ago

Sorry for my late response! I am a bit distracted these days.

Your tests with bubblewrap are quite interesting! I'll check it out, too. I wonder if it is possible to create a container of x11docker/xserver with podman (or docker) and to use its file system with bubblewrap. x11docker might provide to bubblewrap a host Xorg as well as a bubblewrapped Xorg from an x11docker/xserver container.

What would happen if you used Alpine instead of Debian for the xserver image, with these changes minus bwrap?

I'll try this. Alpine images are a bit hard to set up because one has to figure out several dependencies (and the nxagent build has to be done different). However, a test image with Xorg and a few tools is not too hard. Overall alpine could provide a smaller image than the current debian based one.

How is Alpine getting around the need for cap_sys_tty_config?

Alpine does not use systemd/logind, so it must be different. I found that a logged in tty belongs to the logged in user. This might be the key. (I've also tried to add group tty to rootless podman container user, but to no avail. Maybe the groups are namespaced as well?)

$ tty
/dev/tty2
$ ls -l /dev/tty2
crw------- 1 lauscher tty 4, 2 27. Aug 16:35 /dev/tty2

and the user is in video and input groups, a rootless Xorg container should just work. Perhaps you might add a test to a rootless Xorg container to check if the user is a member of the video and input groups?

I found that in rootful container with non-root container user it is enough to add the container user to desired groups. x11docker already adds the x11docker/xserver user to groups video and render. Groups tty and input could be added as well. (Experimentally I already tried that for Xorg in rootless podman.)

I am beginning to think that the only issue with a rootless Xorg container in my CoreOS setup is that the user is not in the video or input groups. And I can't add the user to those groups, either. I think this might be a CoreOS bug, so I'm asking about it in the Fedora discussion group about CoreOS: https://discussion.fedoraproject.org/t/groupadd-usermod-dont-always-work-in-coreos/41735

Odd. And x11docker might need to check those additional group files. An example for useless complexity. sigh

jonleivent commented 2 years ago

I wonder if it is possible to create a container of x11docker/xserver with podman (or docker) and to use its file system with bubblewrap. x11docker might provide to bubblewrap a host Xorg as well as a bubblewrapped Xorg from an x11docker/xserver container.

You can podman-image-mount it somewhere, then bind that as the root in the bubblewrap sandbox.

Alpine does not use systemd/logind, so it must be different.

Do you think sysVinit or runit systems would all allow Xorg to run without cap_sys_tty_config?

I found that a logged in tty belongs to the logged in user. This might be the key.

That occurs in my CoreOs install as well, so it's not the only key.

BTW: I directly edited /etc/group and /etc/gshadow on my CoreOs install so that the user is in video, input and render groups, but still cannot start a containerized rootless Xorg. Am I doing it right?: x11docker --vt=7 --backend=podman --xc=podman --desktop x11docker/fluxbox which gives me the error Did not find the possibility to provide a display. even though the user has the x11docker/xserver image.

I may install bubblewrap and Xorg directly in CoreOs to see if I can replicate the success I had in Alpine that way. I also have a Debian VM that I can boot either with sysVinit or systemd (the MX distro allows the choice at boot time), so I can try this there as well. But, on this Debian, none of startx, xinit or Xorg are suid or have getcap. So my guess is that a rootless bubblewrap sandboxed Xorg would work there, assuming startx works as it is supposed to without a sandbox.

jonleivent commented 2 years ago

The Debian experiment worked - you can start Xorg in a rootless bubblewrap sandbox there as well. Also, same ownership of /dev/ttyN exists there. Also, this was systemd. I didn't even bother altering things so that the display manager wasn't functioning with a root Xorg running - I just switched to a different tty, logged in there, and ran startx with ~/.xserverrc written to call bwrap around Xorg.

jonleivent commented 2 years ago

I've determined that there cases in which rootless sandboxing Xorg with bubblewrap only works without --unshare-pid (and thus --unshare-all as well). In these cases, if --unshare-pid is used, the xf86OpenConsole: Cannot open virtual console... message we've sometimes seen shows up. I don't know what distinguishes the cases - perhaps what modules/extensions are loaded by Xorg?

I don't know if that podman has the same ability as bubblewrap to unshare specific namespaces while sharing others.

mviereck commented 2 years ago

You can podman-image-mount it somewhere, then bind that as the root in the bubblewrap sandbox.

Good to know! That allows further possibilities.

Do you think sysVinit or runit systems would all allow Xorg to run without cap_sys_tty_config?

I am not sure how systemd/logind do the magic, so I don't have an idea for SysVinit and runit. Maybe the magic is just:

I directly edited /etc/group and /etc/gshadow on my CoreOs install so that the user is in video, input and render groups

Please also check if the device files in /dev/dri belong to video and render, and if the device files in /dev/input belong to group input. Otherwise the groups would not help to allow access. Normally the kernel adds the devices to these groups.

Am I doing it right?: x11docker --vt=7 --backend=podman --xc=podman --desktop x11docker/fluxbox which gives me the error Did not find the possibility to provide a display. even though the user has the x11docker/xserver image.

Currently x11docker denies to run Xorg in rootless podman because it is not supported/would fail. To disable the check, look at https://github.com/mviereck/x11docker/blob/master/x11docker#L3338-L3343

      --xorg|--weston|--weston-wayland)
        [ "$Xcrootless" = "yes" ] && {
          $Message "${1:-} cannot claim a new virtual terminal (option --vt) with rootless X container (option --xc)."
          Return=1
        }
      ;;

If you disable line Return=1, x11docker will run Xorg in rootless podman. Option --printcheck would show you those check messages.

I've determined that there cases in which rootless sandboxing Xorg with bubblewrap only works without --unshare-pid I don't know if that podman has the same ability as bubblewrap to unshare specific namespaces while sharing others.

Good find! podman sets up all possible namespaces by default, but they can be disabled manually. To add this to the podman command used by x11docker to run x11docker/xserver, look at https://github.com/mviereck/x11docker/blob/master/x11docker#L4533-L4540

    --xorg)
      Xcontainercommand="$Xcontainercommand $Xc_capdrop"
      Xcontainercommand="$Xcontainercommand $Xc_user"
      Xcontainercommand="$Xcontainercommand $Xc_containerx"
      Xcontainercommand="$Xcontainercommand $Xc_hostx"
      Xcontainercommand="$Xcontainercommand $Xc_gpu"
      Xcontainercommand="$Xcontainercommand $Xc_console"
    ;;

You can add a line:

      Xcontainercommand="$Xcontainercommand --pid=host"

Other namespace option are --ipc, --uts and --network and also accept argument =host.

I don't know what distinguishes the cases - perhaps what modules/extensions are loaded by Xorg?

I doubt that, but have no great idea either. Maybe some sort of authentication - if the pid namespace is the one from host, the system believes that Xorg is a local process and allows some access that it would deny otherwise. But I don't know, just guessing.

jonleivent commented 2 years ago

lease also check if the device files in /dev/dri belong to video and render, and if the device files in /dev/input belong to group input. Otherwise the groups would not help to allow access. Normally the kernel adds the devices to these groups.

/dev/dri/card0 is in group video and /dev/dri/renderD128 is in group render. Everything in /dev/input is in group input.

Currently x11docker denies to run Xorg in rootless podman because it is not supported/would fail. To disable the check, look at https://github.com/mviereck/x11docker/blob/master/x11docker#L3338-L3343

It looks like other parts of the script are denying the possibility. After making the above change, I am getting warnings: Option --xc with rootful container backend needs to be started with root privileges. root needed to claim a tty. Fallback: Disabling option --xc and Option --xc not possible: - without image x11docker/xserver - with runtime kata-runtime - in MS Windows Fallback: Setting --xc=no In my case both root and user have x11docker/xserver images in their stores.

mviereck commented 2 years ago

It looks like other parts of the script are denying the possibility.

Sorry, I should have checked it better before.

I have added a new inofficial option --experimental that allows me to add some experimental code. For now it allows Xorg in rootless containers and also adds --pid=host to Xorg containers. We can add further experimental code this way.

Yet you should be able to run rootless --xc=podman --xorg without an x11docker error but with Xorg errors only. As a shortcut it is enough to type --exp instead of --experimental.

jonleivent commented 2 years ago

Still similar problems. Log file attached... fluxbox-x11docker.log

mviereck commented 2 years ago

This time you get an Xorg error, same as me now:

(EE) 
Fatal server error:
(EE) xf86OpenConsole: Cannot open virtual console 7 (Permission denied)
(EE) 
(EE) 
Please consult the The X.Org Foundation support 
     at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.101.log" for additional information.
(EE) 
(WW) xf86CloseConsole: KDSETMODE failed: Bad file descriptor
(WW) xf86CloseConsole: VT_GETMODE failed: Bad file descriptor
(EE) Server terminated with error (1). Closing log file.

At least x11docker does not forbid running Xorg in a rootless container. That was the intention of my changes above. On this base further experimental code might be added for tests with Xorg in rootless podman.

btw, to see all and only the the Xorg messages, you can use inofficial option --verbose=xinit.

jonleivent commented 2 years ago

On Alpine, where --unshare-all (which includes --unshare-pid) works when rootless bwraping Xorg, the Xorg version is newer than on Debian where it doesn't work to --unshare-pid. Also, Debian had loaded one extra extension: SELinux. But using Xorg's -extension arg to remove that had no impact on the inability to --unshare-pid.

The Xorg versions are: Debian (10.12): Xorg is 1.20.4, with pixman 0.36.0 Alpine: Xorg is 1.21.1.4, with pixman 0.40.0

I have a feeling that the Alpine developers changed something (hence that last extra '.4') and compiled Xorg themselves, considering the other ways Alpine differs from almost every other distro by relying on busybox and musl. What that might mean is that if you want rootless capability in more places, it might be worth the effort to create an Alpine version of x11docker/xserver. That's just a theory. At the very least it might enable --unshare-pid.