mviereck / x11docker

Run GUI applications and desktops in docker and podman containers. Focus on security.
MIT License
5.65k stars 377 forks source link

Timeout waiting for log entry "containerrootrc=ready" #196

Closed phil294 closed 4 years ago

phil294 commented 4 years ago

Hi! Not really sure whats happening here. Happens with either of xephyr, xpra, nxagent and hostidsplay. As if the docker image doesnt even get started. Here is a verbose log output upload: https://waritschlager.de/share/922650f43f77a6fe.txt

Seems to have broken with a x11docker update of last few weeks/months, as I did not have any such problems beforehand. I can cycle through recent commits to find the culprit tomorrow.

until then, LG

mviereck commented 4 years ago

Thank you for the bug report!

I found this error message:

/tmp/containerrootrc: 11: /tmp/containerrootrc: cannot create /x11docker/container.log: Permission denied

It is this line in containerrootrc:

exec 1>>/x11docker/container.log 2>&1

It only serves to redirect some log output. I'll think about a possible alternative. The error message is confusing because the file exists for sure.

phil294 commented 4 years ago

I got around to check the commits already: bcb791cc9d581e1cb404761c52127ca63d26d110 is when the issue arose. The log says x11docker[14:18:12,517]: Waiting since 2s for /x11docker/containerrootrc.ready to have content, will wait up to 32000 seconds.. Do you need any more verbose logs from around this commit?

Part of the problem (seeing that this issue hasnt been opened yet) may be that I set the permissions on ~/.local and most other folders in $HOME to drwx------ 700. I think it makes the most sense as the contents of this folder arent any other user's business, be it system or human. This is what had had me open another issue once already (#131), where you fixed it by leveraging a /tmp file.

I also found that running with --cap-default fixes it. If you decide this is a wont-fix, I could personally live with using this flag, but I assume a proper error message would be great, at least.

In the end, I am not sure what exactly is causing this but I hope this will help you. Cheers

phil294 commented 4 years ago

It also did not work after a chmod -R o+rX ~/.cache, however one warning came up during this: chmod: changing permissions of '.cache/x11docker/x11docker-xfce-xfce4-terminal-55668200901/share/tini': Operation not permitted.

This is what the respective share folder looks like:

-rw-r--r-- 1 phi  phi   3054 Nov 20 14:14 container.CMD.sh
-rw-r--r-- 1 phi  phi   4412 Nov 20 14:14 containerrootrc
-rw-r--r-- 1 phi  phi      4 Nov 20 14:14 container.user
-rw-rw-rw- 1 phi  phi    293 Nov 20 14:14 environment
-rw-r--r-- 1 phi  phi      2 Nov 20 14:14 exitcode
-rw-r--r-- 1 phi  phi      0 Nov 20 14:14 journalctl.log
prw-rw-rw- 1 phi  phi      0 Nov 20 14:14 message.fifo
-rw-rw-rw- 1 phi  phi    704 Nov 20 14:14 stderr
-rw-rw-rw- 1 phi  phi      0 Nov 20 14:14 stdout
-rw-r--r-- 1 phi  phi     51 Nov 20 14:14 timetosaygoodbye
prw-rw-r-- 1 phi  phi      0 Nov 20 14:14 timetosaygoodbye.fifo
-rwxr-xr-x 1 root root     0 Nov 20 14:14 tini
-rw-r--r-- 1 phi  phi  69315 Nov 20 14:14 x11docker.log
-rw-r--r-- 1 phi  phi    104 Nov 20 14:14 Xclientcookie

which looks fine I think (?)

mviereck commented 4 years ago

Thanks for your investigation!

Part of the problem (seeing that this issue hasnt been opened yet) may be that I set the permissions on ~/.local and most other folders in $HOME to drwx------ 700. I think it makes the most sense as the contents of this folder arent any other user's business, be it system or human. This is what had had me open another issue once already (#131), where you fixed it by leveraging a /tmp file.

Ah, yes! I already thought that this bug looks somehow familar. So I reintroduced it.

As a first fix that maybe helps I've changed the file permission of x11docker.log to 666. I doubt that this is enough, but it would be nice if you try it out.

I also found that running with --cap-default fixes it.

The point is that x11docker runs the container with docker option --cap-drop=ALL. That disallows a lot of root privileges. But containerrootrc tries to access x11docker.log as root in container and cannot supersede your 700 settings on host.

Adding the capability to supersede access permissions should solve the issue (compare man capabilities):

x11docker -- --cap-add DAC_OVERRIDE -- x11docker/xfce xfce4-terminal

I am currently not sure what would be the best solution.

chmod -R o+rX ~/.cache

Sorry for asking, but what exactly does that do?

phil294 commented 4 years ago

I tried using the latest commit, still no change.

Nope, DAC_OVERRIDE on its own does not cut it, only cap-default.

I am not very sure on to the details of this. But option 1 and 2 do not sound quite elegant. And it would make sense to prevent root files in the host environment. Those are always a bummer when it comes to archives, searches, deletions etc. Like when you want to backup your home folder but get errors somewhere deep nested inside .local/share/x11docker, as it contains root files. Option 3 makes the most sense to me, but this is only intuition.

chmod -R o+rX ~/.cache Sorry for asking, but what exactly does that do?

No worries – it means recursively giving read access (+4) to "other" users (not owner or group) to everthing inside .cache. X stands for folder execution rights (+1), so they will be able to list folder contents, but it does not apply to files (so no scripts can be run etc). + means "adding permissions", so if any were present beforehand, those are combined. To set them, use =.

phil294 commented 4 years ago

Ok wait, the recent commit might have solved this (but everything is kinda laggy now). Just so you dont spend too much unnecessary digging here. I'll get back to you later.

mviereck commented 4 years ago

Ok wait, the recent commit might have solved this (but everything is kinda laggy now).

Good! Maybe there is another timeout now. Please give me a fresh logfile and I'll look through it.

And it would make sense to prevent root files in the host environment. Those are always a bummer when it comes to archives, searches, deletions etc.

x11docker already avoids root files in HOME. root only appends lines to existing files. That does not change the file ownership. But this seems to be an issue with 700 for ~/.cache


I am surprised why you have a root-owned tini in the share folder:

-rwxr-xr-x 1 root root     0 Nov 20 14:14 tini

It should not be there. It is shared with the container with:

  --volume '/usr/bin/docker-init':'/usr/local/bin/tini':ro \

In my own x11docker cache tini does not appear.

phil294 commented 4 years ago

With latest commit:

With previous commit, before your latest fix (at 25b8034f24238eed267784def8f2e33de5a489d1), everything seems the same as above. So nothing changed.

Here's a new verbose log file as requested, from latest commit's ./x11docker -v --xephyr x11docker/xfce xfce4-terminal, run from terminal: https://waritschlager.de/share/f43f639000f6f3c6.txt Nothing seems to have changed much

* but is slow, like really laggy, vscode editor is pretty much unusable. But somehow now also older versions of x11docker are as slow, when they were smooth a few days ago(?!) I will investigate this further, you may consider those cases working perfectly for now

phil294 commented 4 years ago

I am surprised why you have a root-owned tini in the share folder

I thought I'd try to reproduce it by deleting the respective cache folder. But now when I run it (with cap-default), no folder is recreated inside .cache/x11docker anymore, so I cannot give you any more details.

phil294 commented 4 years ago

Okay, regarding the lagginess: This might not be related to this issue at all, but I'll post the info here anyway:

The lags got introduced with 5a35b8107ca043d2f0dd8a2fafe97157164bbc5f. Note that this is an earlier one than the one that broke everything described above. I only encounter those lags with Xephyr. The application I tested this with is xfce, with vscode as the application running inside. I dont know what is so special about it, but thunar for instance was not lagging. But maybe it's just not as resource intensive. Please tell me if you want more info here.

While digging, I also realized that nxagent and hostdisplay seem to behave more fluent than xephyr and xpra. I dont know how nxagent works, but with hostdisplay it makes sense as no key presses are proxied.

mviereck commented 4 years ago

Thank you for your detailed investigation!

The lags got introduced with 5a35b81. Note that this is an earlier one than the one that broke everything described above. I only encounter those lags with Xephyr. The application I tested this with is xfce, with vscode as the application running inside. I dont know what is so special about it, but thunar for instance was not lagging.

Finally an issue that I can fix easily! In the commit you found I've enabled Xephyr option -glamor. From Xephyr -help:

-glamor              Enable 2D acceleration using glamor

glamor should help to speed up some things, but appearently it can be problematic. I've disabled it yet. --xephyr should not be laggy anymore.

While digging, I also realized that nxagent and hostdisplay seem to behave more fluent than xephyr and xpra

--hostdisplay is the fastest option because no additional X server is involved. Unfortunately it costs some container isolation. A malicious application could access your host system. --xpra is the slowest option. But it is a preferred default of x11docker because it provides some nice features, e.g. graphical clipboard support. Furthermore it is the only seamless solution for --gpu beside the insecure --hostdisplay. However, if security/container isolation is not a concern, --hostdisplay --gpu is the fastest setup with the lowest overhead.

I thought I'd try to reproduce it by deleting the respective cache folder. But now when I run it (with cap-default), no folder is recreated inside .cache/x11docker anymore, so I cannot give you any more details.

The cache folder only exists while the container is running. If you don't have a cache folder while x11docker is running, something very basical goes wrong. Your log shows valid pathes:

 /home/phi/.cache/x11docker/x11docker-xfce-xfce4-terminal-44151347946

I just tried to reproduce the 700 issue with chmod -R 700 ~/.cache/x11docker. Surprisingly I have no issues at all and cannot reproduce your issue. x11docker just starts up well. I'll look closer how to reproduce it.


Started from a noninteractive shell, with no special options (no cap-default), using a xfce hotkey shortcut on the host system: Works fine*. I did not expect that. It is the opposite behavior of a past issue #177 where it was the shortcut action that failed. - I checked the user from such a shortcut job, it is also the normal one, "phi". Maybe a tty issue again?

That's odd. I doubt that it is a tty issue again. I would see that in the log. Maybe you have two x11docker on your system, e.g. one in a cloned git folder and one in /usr/bin?


Here's a new verbose log file as requested, from latest commit's ./x11docker -v --xephyr x11docker/xfce xfce4-terminal, run from terminal: https://waritschlager.de/share/f43f639000f6f3c6.txt Nothing seems to have changed much

The log shows version 6.4.0, but the master version with recent commits should show:

$ x11docker --version
6.4.1-beta
phil294 commented 4 years ago

Huh, sorry - I dont know how the version mismatch happened. I redid it with the latest commit from yesterday, for sure this time: https://waritschlager.de/share/32492a8420106529.txt

That's odd. I doubt that it is a tty issue again. I would see that in the log. Maybe you have two x11docker on your system, e.g. one in a cloned git folder and one in /usr/bin?

No, this is not the case. xfce shortcut and interactive bash definitely behave differently (one working, the other not). I removed all the times and pids from it with some wild regex and skipped display numbers etc. and below are the notable log output differences when run as xfce shortcut. As expected, the only real difference seems to be that the container.log permission error is gone.

8a9
> DEBUGNOTE: check_host(): Command tty failed. Guess if running on console: no
45a47
> DEBUGNOTE: check_host(): Command tty failed. Guess if running on console: no
124c126
<   Running in a terminal:                          yes
---
>   Running in a terminal:                          no
371,377d372
< grep -x -q 'x11docker/xfce' < /home/phi/.cache/x11docker/docker.imagelist || grep -x -q 'x11docker/xfce:latest' < /home/phi/.cache/x11docker/docker.imagelist || {
<   docker inspect x11docker/xfce >>/home/phi/.cache/x11docker/x11docker-xfce-/share/container.log 2>&1 || {
<     echo 'Image x11docker/xfce not found locally.' >&2
<     echo 'Do you want to pull it from docker hub?' >&2
<     askyesno && Dockerpull=yes || error "Image 'x11docker/xfce' not available locally and not pulled from docker hub."
<   }
< }
382a378
>   env DISPLAY=':0.0' DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/1000/bus' bash           -c "notify-send 'x11docker: Pulling image x11docker/xfce from docker hub'" 2>/dev/null &
1475c1471,1495
< /tmp/containerrootrc: 11: /tmp/containerrootrc: cannot create /x11docker/container.log: Permission denied
---
> mkdir: created directory '/var/run/dbus'
> mkdir: created directory '/tmp/.ICE-unix'
> mkdir: created directory '/tmp/.X11-unix'
> mkdir: created directory '/tmp/.font-unix'
> srwxrwxrwx 1 1000 1001 0 Nov 23 08:10 /X113
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> DEBUGNOTE: Running containerrootrc: Setup as root in container
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> lrwxrwxrwx 1 root root 5 Nov 23 08:10 /tmp -> /X113
> mkdir: created directory '/fakehome'
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> DEBUGNOTE: containerrootrc: Container libc: glibc
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> removed '/etc/shadow'
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> x11docker: Container system ID: debian
> 
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> chown: changing ownership of '/tmp/chowntestfile': Operation not permitted
1551d1572
< DEBUGNOTE: waitforlogentry(): tailstderr: Found log entry "x11docker=ready" in store.info.
1553,1592c1574,1586
< DEBUGNOTE: waitforlogentry(): containerrc: Waiting since 11s for log entry "containerrootrc=ready" in store.info
< DEBUGNOTE: waitforlogentry(): containerrc: Waiting since 12s for log entry "containerrootrc=ready" in store.info
...
---
> DEBUGNOTE: waitforlogentry(): tailstderr: Found log entry "x11docker=ready" in store.info.
> DEBUGNOTE: waitforlogentry(): containerrc: Found log entry "containerrootrc=ready" in store.info.

So it fails here: https://github.com/mviereck/x11docker/blob/master/x11docker#L4282. $(convertpath share $Containerlogfile), which resolves to /x11docker/container.log isnt accessible, because /x11docker itself cannot be traversed into. I put an ls -l / at that position and here is the output:

total 16
srwxrwxrwx   1 1000 1001    0 Nov 23 08:57 X120
drwxr-xr-x   2 root root 4096 Jul 14 08:49 bin
drwxr-xr-x   2 root root    6 May 13  2019 boot
drwxr-xr-x   5 root root  360 Nov 23 08:57 dev
drwxr-xr-x  40 root root 4096 Nov 23 08:57 etc
drwxr-xr-x   2 root root    6 May 13  2019 home
drwxr-xr-x   8 root root  107 Jul 14 08:49 lib
drwxr-xr-x   2 root root   34 Jul  8 03:30 lib64
drwxr-xr-x   2 root root    6 Jul  8 03:30 media
drwxr-xr-x   2 root root    6 Jul  8 03:30 mnt
drwxr-xr-x   2 root root    6 Jul  8 03:30 opt
dr-xr-xr-x 374 root root    0 Nov 23 08:57 proc
drwx------   2 root root   37 Jul  8 03:30 root
drwxr-xr-x   3 root root   60 Nov 23 08:57 run
drwxr-xr-x   2 root root 4096 Jul  8 03:30 sbin
drwxr-xr-x   2 root root    6 Jul  8 03:30 srv
dr-xr-xr-x  13 root root    0 Nov 23 08:57 sys
drwxrwxrwt   2 root root   29 Nov 23 08:57 tmp
drwxr-xr-x  10 root root  105 Jul  8 03:30 usr
drwxr-xr-x  11 root root  139 Jul  8 03:30 var
drwxrwx---   2 1000 1001 4096 Nov 23 08:57 x11docker
/tmp
ls: cannot access '/x11docker/container.log': Permission denied

and $PWD is /tmp and $USER is phi and $UID is empty and id says uid=0(root) gid=0(root) groups=0(root) and cat /etc/passwd also does not contain phi. So the user $USER doesnt exist..?!

On my host system, the the $UID is (as on most systems) 1000.

Thanks for removing the glamor option! Everything is smooth again.

The cache folder only exists while the container is running.

Oh, huh. The cache folders I described above were present without any container running. So I guess they were leftovers from failed run attempts. Should not matter, this doesnt happen anymore right now.

mviereck commented 4 years ago

No, this is not the case. xfce shortcut and interactive bash definitely behave differently (one working, the other not).

That's really odd. I have no good idea why there is a difference. That also indicates that 700 is not the core issue. As been said, I cannot reproduce the issue if I set my own cache to 700. The only idea I have yet is some sort of >>redirection issue. But I would not know why it only happens in terminal, but not with a shortcut.

  echo "exec 1>>$(convertpath share $Containerlogfile) 2>&1"

Could you try to just disable this line? It only serves to redirect some output into the logfile, so it would not hurt essentially. Though, ls fails, too:

ls: cannot access '/x11docker/container.log': Permission denied

$PWD is /tmp and $USER is phi and $UID is empty and id says uid=0(root) gid=0(root) groups=0(root) and cat /etc/passwd also does not contain phi. So the user $USER doesnt exist..?!

The entries in /etc/passwd and etc/group are done shortly after that in containerrootrc.

The cache folders I described above were present without any container running. So I guess they were leftovers from failed run attempts. Should not matter, this doesnt happen anymore right now.

I also get those leftover folders. It seems x11docker does not get enough time to clean up if I shut down the system while x11docker is running. You can use sudo x11docker --cleanup to remove all leftovers (and currently running x11docker sessions).

mviereck commented 4 years ago

I did a test in an old Manjarao VM. I've set ~/.cache/x11docker to 700. It works well.

I could not update Manjaro due to some package conflicts. But I assume that would not change anything.

So i cannot reproduce your issue and have no idea why x11docker fails on your system in terminal only.

Would it be ok for you to just use --cap-default and close the ticket? Though, if you have an idea and want to investigate further, I am happy to look at, too.

phil294 commented 4 years ago

Sure. I'll get back to you when I come accross a meaningful hint. Thank you for the help!

mviereck commented 4 years ago

The latest commit runs containerrootrc with flag --privileged. You should not need --cap-default anymore. x11docker's root setup in container now has no restrictions. This allow to drop --cap-default and the container command will run without privileges at all. It would be nice if you try this out.

phil294 commented 4 years ago

yup, works now out of the box :-) good job

mviereck commented 4 years ago

Great! :-)

Finally solved although we did not find the very special point where it previously failed.