Closed Azkali closed 3 years ago
Looks like x11docker dropped your last "--device=" entry:
--device=/dev/nvhost-nvdec
Either that or the log you posted is from a different command.
No, it is right, you're mistaking the docker run test and the x11docker full log that includes x11docker command ( first line ). Just updated it to be clearer. Edit: complete log file was wrong though, it was using --no-setup option, just updated it using the same x11docker command used above
Thank you for the report. I am not sure what is happening. The core error message is:
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
It is quite unlikely that there is no sh
in the image. Please try if you get an interactive sh
shell in container with this command:
x11docker -ti x11docker/lxde sh
I found that you use Ubuntu 20.10 on the host. Ubuntu uses snap
instead of apt
in some versions. That already caused several issues in the past. Is your docker installed with snap
or apt
?
Please try with --cap-default
and/or --no-setup
if you get different error messages.
The device files and -v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra
should cause no issue here. However, please try without those.
You're welcome, thank you for your quick answer.
I'm using docker.io
package from apt
.
x11docker -ti x11docker/lxde sh
( it hangs doing nothing, but another error pops up see log file ): x11docker.log
Using --no-setup
or --cap-default
casue the same issue. (OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
)
Removing both volumes in every cases cause the same issue too.
Really odd. The new error is
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "open /dev/ptmx: no such device": unknown
Does it work running docker directly? Please try:
docker run --rm -ti x11docker/lxde sh
and
docker run --rm -ti --user=1000:1000 x11docker/lxde sh
and
docker run --rm -ti --user=1000:1000 --cap-drop=ALL x11docker/lxde sh
All of the three commands worked perfectly ! It is indeed strange.. I am clueless at the moment
I am clueless at the moment
Me too ...
Maybe an issue with docker-init
? Try:
x11docker -ti --user=root --cap-default --init=none x11docker/lxde sh
and:
docker run --rm -ti --init x11docker/lxde sh
With x11docker :
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "open /dev/ptmx: no such device": unknown
Docker alone works here again
Currently I am out of ideas. Just waiting for inspiration yet.
Unrelated to the issue itself: I've added the device files /dev/nvhost*
and /dev/nvmap
to x11docker option --gpu
. Only the driver itself is not shared because that is rather unreliable. It is better to provide the driver in other ways.
Maybe try another image? This one is reported to work on arm64:
x11docker aptman/dbhi:bionic-octave octave
Thanks a lot for integrating the devices ! ( sure, it was quicker for testing to pass the drivers as is )
I am testing the above image as we speak. The device share wet flawlessly using --gpu
option.
If I found anything new I'll report that here. Thanks for your time !
I've combined the generated docker command with your working example:
docker run --tty --rm --detach \
--name x11docker_X127_x11docker-lxde_17663627356 \
--user 1000:1000 \
--env USER=azkali \
--userns host \
--cap-drop ALL \
--security-opt no-new-privileges \
--security-opt label=type:container_runtime_t \
--volume '/usr/bin/docker-init':'/usr/local/bin/init':ro \
--tmpfs /run --tmpfs /run/lock \
--volume '/tmp/.X11-unix/X127':'/X127':rw \
--workdir '/tmp' \
--entrypoint env \
--env 'container=docker' \
--env 'NO_AT_BRIDGE=1' \
--env 'GTK_CSD=0' \
--env 'GTK_OVERLAY_SCROLLING=0' \
--env 'MWWM=allwm' \
--env 'MWNO_RIT=true' \
--env 'MWNOCAPTURE=true' \
--env 'QT_X11_NO_NATIVE_MENUBAR=1' \
--env 'UBUNTU_MENUPROXY=' \
--env DISPLAY=$DISPLAY \
--network=host \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-- x11docker/lxde lxterminal
If this fails, the bug is encircled.
This still work ....
Also the above octave image uses weston by default with x11-backend.so
which isn't ship in our BSP. ( Thanks Nvidia ..... )
So I tried -W
but I couldn't have wayland to be used by my ubuntu.
I tried with --xorg
and got the same issue as we ran into earlier.
I found a bug in experimental option --no-setup
.
In regular case x11docker runs a docker exec --user=root [...] sh
into the container, mostly for container user setup.
--no-setup
should suppress this behaviour, but due to a bug it did not correctly.
Maybe the sh
error occurs in the docker exec
command.
Please update and try:
x11docker -ti --no-setup --cap-default x11docker/lxde sh
Also the above octave image uses weston by default
I'll have a look to change this. You mean, x11docker uses weston for octave? Did you use option --gpu
?
This worked :slightly_smiling_face: Thank you, I confirm that it launches X too ! If I see any new issues I'll let you know, closing the issue for now.
Edit: For the octave image, I used --gpu
but the issue might surely be on my end ( concerning Wayland backend not working ) otherwise Tegra just missed x11-backend, it's not supplied by Nvidia in their BSP
This worked
Great!
If I see any new issues I'll let you know, closing the issue for now.
Using --no-setup
is a workaround, the underlying bug is not fixed yet. So I'll reopen the ticket.
Could you run further tests?
The bug is likely in these lines / docker exec
commands:
[ "$Switchcontaineruser" = "no" ] && [ "$Containersetup" = "yes" ] && {
echo "debugnote 'dockerrc(): Starting containerrootrc with privileged docker exec'"
echo "# copy containerrootrc inside of container to avoid possible noexec of host home."
echo "$Dockerexe exec --privileged --tty $Containername sh -c 'cp $(convertpath share $Containerrootrc) /tmp/containerrootrc ; chmod 644 /tmp/containerrootrc' 2>&1 | rmcr >>$Containerlogfile"
echo "# run container root setup. containerrc will wait until setup script is ready."
echo "$Dockerexe exec --privileged --tty -u root $Containername /bin/sh /tmp/containerrootrc 2>&1 | rmcr >>$Containerlogfile"
echo ""
}
Could you run a docker exec
command in a running x11docker container?
For example, run:
x11docker --showid x11docker/lxde lxterminal
Option --showid
prints the container id.
You could try
docker exec --privileged --tty -u root CONTAINERID sh -c "pstree"
For the octave image, I used --gpu but the issue might surely be on my end ( concerning Wayland backend not working ) otherwise Tegra just missed x11-backend, it's not supplied by Nvidia in their BSP
x11docker should not automatically use Wayland setups with --gpu
if an NVIDIA card is present. In fact, NvIDIA GPU acceleration only works with --hostdisplay
and --xorg
.
I'll have a look at the x11docker default checks.
Oh my bad, definitely.
Running --showid
alone, gives the same error as before where the container couldn't find the executable path for sh
.
With --no-setup
and --showid
same error.
With --no-setup
, --showid
, --gpu
and --hostdisplay
it works and spawns lxterminal, a log message appears, saying that the program is waiting for the container to stop in order to show the id of the container, so I've retrieved it using docker.
Attaching ( docker exec ... CONTAINERID
) to this x11docker-spawned container gives the same issue as previously with /dev/ptmx
That sounds like a mess.
Can you please show me the full commands? I am a bit confused now. E.g., what means "--showid
alone"? x11docker --showid
? Or a full command with image name?
Attaching ( docker exec ... CONTAINERID ) to this x11docker-spawned container gives the same issue as previously with /dev/ptmx
Can you try variations of the docker exec
command if something makes a difference? Changing options, using different commands.
For example:
docker exec CONTAINERID pstree
docker exec CONTAINERID sh -c "pstree"
docker exec --privileged -u root CONTAINERID sh -c "pstree"
docker exec -u root CONTAINERID sh -c "pstree"
At least for the
Edit: I've removed the /dev/ptmx
issue dropping --tty
might make a difference.--tty
option in docker exec
in x11docker. Maybe that already makes a difference, maybe even fixes the bug. Please update and run a check without --no-setup
.
Edit2: I've build an x11docker/lxde
image based on arm64v8/debian
, started with QEMU on amd64. I could not reproduce the issues.
First test, fails to spawn container, lxterminal does't start :
$ x11docker --showid x11docker/lxde lxterminal`
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
Second test, same as above, lxterminal does't start :
$ x11docker --no-setup --showid x11docker/lxde lxterminal`
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
Third test, which launch lxterminal and spawn the container, but doesn't show the ID in any case :
$ x11docker --no-setup --showid --gpu --hostdisplay x11docker/lxde lxterminal`
x11docker note: Option --no-setup: experimental only.
x11docker WARNING: User azkali is member of group docker.
That allows unprivileged processes on host to gain root privileges.
x11docker WARNING: Option --gpu degrades container isolation.
Container gains access to GPU hardware.
This allows reading host window content (palinopsia leak)
and GPU rootkits (compare proof of concept: jellyfish).
x11docker note: Option --gpu: To allow GPU acceleration with --hostdisplay,
x11docker will allow trusted cookies.
x11docker note: Option --hostdisplay: To allow --hostdisplay with trusted cookies,
x11docker must share host IPC namespace with container (option --hostipc)
to allow shared memory for X extension MIT-SHM.
x11docker WARNING: Option --hostdisplay with trusted cookies provides
QUITE BAD CONTAINER ISOLATION !
Keylogging and controlling host applications is possible!
Clipboard sharing is enabled (option --cliboard).
It is recommended to use another X server option like --xpra or --nxagent.
x11docker WARNING: Option --hostipc severely degrades
container isolation. IPC namespace remapping is disabled.
x11docker note: Option --init: Unknown init system
Possible: tini systemd sysvinit openrc runit s6-overlay none
Fallback: Using --init=tini instead.
x11docker WARNING: Sharing device file: /dev/dri
x11docker WARNING: Sharing device file: /dev/nvhost-as-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-ctrl
x11docker WARNING: Sharing device file: /dev/nvhost-ctrl-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-ctrl-isp
x11docker WARNING: Sharing device file: /dev/nvhost-ctrl-isp.1
x11docker WARNING: Sharing device file: /dev/nvhost-ctrl-nvdec
x11docker WARNING: Sharing device file: /dev/nvhost-ctxsw-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-dbg-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-isp
x11docker WARNING: Sharing device file: /dev/nvhost-isp.1
x11docker WARNING: Sharing device file: /dev/nvhost-msenc
x11docker WARNING: Sharing device file: /dev/nvhost-nvdec
x11docker WARNING: Sharing device file: /dev/nvhost-nvjpg
x11docker WARNING: Sharing device file: /dev/nvhost-prof-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-sched-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-tsec
x11docker WARNING: Sharing device file: /dev/nvhost-tsecb
x11docker WARNING: Sharing device file: /dev/nvhost-tsg-gpu
x11docker WARNING: Sharing device file: /dev/nvhost-vic
x11docker WARNING: Sharing device file: /dev/nvmap
x11docker ERROR: Got error message from docker daemon:
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": unknown
Last lines of logfile:
Type 'x11docker --help' for usage information
Debug options: '--verbose' (full log) or '--debug' (log excerpt).
Logfile will be: /home/azkali/.cache/x11docker/x11docker.log
Please report issues at https://github.com/mviereck/x11docker
(lxterminal:1): dbind-WARNING **: 10:12:52.343: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-AMcGzkfGpZ: Connection refused
** (lxterminal:1): WARNING **: 10:12:52.494: Bind on socket failed: No such file or directory
** (lxterminal:1): WARNING **: 10:12:52.497: Configuration file create failed: No such file or directory
DEBUGNOTE[12:12:53,184]: finish(): Container still running. Executing 'docker stop'.
Will wait up to 15 seconds for docker to finish.
DEBUGNOTE[12:12:53,215]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:54,254]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:55,286]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:56,316]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:57,350]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:58,380]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:12:59,417]: finish(): Waiting for container to terminate ...
DEBUGNOTE[12:13:00,492]: finish(): Container terminated successfully
DEBUGNOTE[12:13:00,708]: x11docker exit code: 64
From there I did docker ps
, copied the ID and :
$ docker exec --privileged --tty -u root 1f91e19d4cb1 sh -c "pstree"
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "open /dev/ptmx: no such device": unknown
I'll do the above test ASAP
Hum just saw your edits, I'm starting to think that somehow installing nvidia runtime messed up my whole docker package/configs. I'll re-flash and test everything again without installing nvidia runtime package.
I'll re-flash and test everything again without installing nvidia runtime package.
That would be interesting. I have no good idea what is wrong yet.
The --tty
changes might help, but that is rather a blind guess and likely only cover underlying issues if they help at all.
I'm starting to think that somehow installing nvidia runtime messed up my whole docker package/configs.
Ubuntu on arm might be another source of issues. Debian might be a better choice.
Thank you, I'll prepare a plain debian then for the next tests ( I am using this board as my daily driver so I need to find the good timing to re flash )
Closing due to inactivity. If you do further tests on this, we can reopen.
Hi, so I did further testing using arch Linux, yesterday, it happens that we where missing some kernel option related to namespace.
I did reproduce the same exact issues with the same commands in arch with the right options now enabled in our kernel, I think it should be good to share with you our kernel repository : https://gitlab.com/switchroot/kernel/l4t-kernel-4.9/-/blob/linux-rel32-rebase/arch/arm64/configs/tegra_linux_defconfig
Sorry for not keeping you updated with this for a while.
Thank you for the feedback! I'll reopen now.
Because the issue seems to boil down to docker exec
issues I think it might be a problem with nsenter
that allows processes to enter namespaces of other processes.
Maybe this has to be somehow allowed in your kernel configuration.
A short test without x11docker:
$ docker run --name mysleep alpine sleep 10 &
[1] 222869
$ docker exec mysleep ps aux
PID USER TIME COMMAND
1 root 0:00 sleep 10
7 root 0:00 ps aux
I assume that docker exec
uses nsenter
respective the kernel feature behind nsenter
.
Test result :
$ docker run --name mysleep alpine sleep 10 &
[1] 6456
$ docker exec mysleep ps aux
OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused exec: "ps": executable file not found in $PATH: unknown
Ok, this indicates that there is an issue with the kernel.
I assume that your kernel configuration misses something to allow entering namespaces of other processes like nsenter
or docker exec
do.
Thank you, I'll take a look at what can be possibly missing and get back to you as soon as I have more insight on the kernel configuration.
I think I can close here because it is not an x11docker bug but a kernel configuration issue and can be reproduced with docker exec
alone.
A workaround is option --no-setup
that avoids docker exec
in x11docker.
Platform: Jetson-TX1 Architecture: aarch64 OS: Ubuntu 20.10 Docker version:
Hi, I'm trying to get x11docker and pass through the GPU, without using nvidia's runtime.
My test have been successful outside of x11docker, and the GPU is correctly being pass through/used inside the docker containers I used.
Working test using docker only :
Note: I'm using x11docker-Dockerfiles but built for aarch64
x11docker command :
debug log from x11docker :
Complete log file: x11docker.log