Closed johncadengo closed 3 years ago
Please upload the /tmp/bootstrap--stdout.log and /var/log/Xorg.0.log files.
/tmp/bootstrap--stdout.log https://pastebin.com/irk8YWBA
/var/log/Xorg.0.log https://pastebin.com/AFhWZzgQ
By the way, I have 2 GPUs on this system. So I am just sharing one with the docker container in testing.
While the bootstrap-stdout.log is an invalid link and thus I cannot deduce the whole issue, try using DP-0 for VIDEO_PORT. I see that it is a Quadro GPU. If it doesn't work, I think it might be a driver thing, either in the compose settings or the container toolkit.
My apologies, a few characters were cut off in the copy and paste. Here's the link (and updated the original comment): https://pastebin.com/irk8YWBA
Changing the environment variable to VIDEO_PORT=DP-0
did not change the xrandr output. It's still the same. Maybe the bootstrap log will help.
I might need to replicate stuff. I'll try using docker-compose myself. Please be a bit patient.
version: '3.8'
services:
glx:
image: 'ghcr.io/ehfd/nvidia-glx-desktop:latest'
environment:
- TZ=UTC
- SIZEW=1920
- SIZEH=1080
- SHARED=TRUE
- PASSWD=mypasswd
- VIDEO_PORT=DFP
ports:
- '8080:8080'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu, utility]
stdin_open: true
tty: true
I was unable to reproduce any issues with this configuration. Try using the latest docker-compose version rather than the one installed with the package manager. Also, you have to take down a container that you have started up with docker-compose before starting a new one.
I just tested it with your docker-compose configuration and am still encountering the same error. I updated docker-compose to the latest version for my kernel, which is:
$ docker-compose --version
docker-compose version 1.29.2, build 5becea4c
Still experiencing the issue. I'll try other things to debug it.
Strongly suspect an issue in your driver installation or NVIDIA container runtime.
Well, what's weird is that it works when I use docker run
. It just doesn't work when I'm using docker compose
, so I'm not sure what about the driver installation is different between those two commands. Seems like there might be an implicit configuration being set differently between the two commands? Or a different level of privilege or access set by default?
The code in the midst of an overhaul and will get public around next week. While it seems unrelated to this issue now, it could change things.
@ehfd looking forward to the updates.
Just wanted to let you know, for some reason, the most recent version of this repo that works on my computer is the commit from March 12: https://github.com/ehfd/docker-nvidia-glx-desktop/commit/cec9907cf2ad826aac53946e40bb9226fc4ea5b1
The commits after that get stuck at the bootstrap phase for some reason. I could dig further and grab you the logs later, but just thought to let you know.
Ok, here's two interesting things I ran into:
My version was 460.106.00, which must have come from a PPA or something. I'm not sure. However, because it wasn't on that website, it wasn't working. I ended up finding a version that matched one of the available versions and that worked for me.
https://us.download.nvidia.com/XFree86/Linux-x86_64/
and the us
subdomain was later dropped. So that's important to note in using older versions of this repo. It basically renders any of the older commits unusable unless the user updates that URL manually.Ok, I understand what is the issue. Same issue as #16 then it is.
New release with commit 952ff0c8ca3161c7f38146d775b3b9826b4dc06a
@ehfd Thanks for the update. Looks like a lot of great changes! 💯
In the readme, you mention that you should only start up one xserver per GPU. You're referring to the guest xservers, I'm assuming, so only 1 guest container per GPU? Is there any plan to support multiple containers per GPU, either in this or the EGL repo?
Also, the gstreamer interface looks really promising. What are the advantages over novnc? Is it for audio support or is it also for better performance?
Great job, and thanks again!
In the readme, you mention that you should only start up one xserver per GPU. You're referring to the guest xservers, I'm assuming, so only 1 guest container per GPU? Is there any plan to support multiple containers per GPU, either in this or the EGL repo?
Yes, there are only guest X servers and no host X servers with the GLX container, and it supports 1 guest container per GPU out of the box. However, it's possible to create multiple screens by allocating each screen to a different physical video port which involves changing the entrypoint.sh
script and invoking a noVNC or WebRTC instance on different ports for each screen. The EGL container (to be updated to support WebRTC) supports multiple containers per GPU out of the box and at the same time also has fallback capabilities to software acceleration because it does not use an Xorg server with NVIDIA drivers, but still will have the restrictions such as having no Vulkan.
Things are expected to become more flexible when the NVIDIA Wayland compatibility matures, and some new browser capabilities in the future are implemented.
Also, the gstreamer interface looks really promising. What are the advantages over novnc? Is it for audio support or is it also for better performance?
It uses the same underlying protocols as common "game streaming" services such as Parsec, Rainway, GeForce NOW, Google Stadia, and others (all supporting Windows hosts only, if they indeed support user provided hosts). It works well in conditions that require bleeding edge graphics capabilities as it uses H.264 AVC instead of libjpeg-turbo (RFB/VNC) or libpng (Guacamole). Performance where frequent screen refreshes are required seems to be WebRTC >>> noVNC > Guacamole, while noVNC does not support audio as well. But WebRTC is more complicated to setup if it requires a TURN server, and it is a compromise to achieve latency incapable with WebSockets (but this will change over the years).
https://cloud.google.com/architecture/gpu-accelerated-streaming-using-webrtc Selkies-gstreamer was developed by the person who wrote this.
https://dx.doi.org/10.13140/RG.2.2.29960.96005 And I wrote this (to be updated to explain the new release).
I'm not sure I fully understand the point about single X session per GPU, but I can report that we have been able to get 20 running X instances on a single T4 using nvidia-docker
as the runtime in our kubernetes cluster (This was just a simple test to get some understanding of possibilities - in this case, we were actually memory bound on the VM and for these vanilla X sessions doing nothing, we could prob have put more on the T4 - we will do some slightly more demanding dimensioning experiments and can put a few notes on this thread).
@seanrmurphy thanks for sharing your experience. I'd really appreciate if you could put a few notes on this thread, and I'll share what I can after trying to replicate your results. I'm personally not using Kubernetes, just docker, but it would be great to see how you're able to get 20 X instances up at once. That sounds great.
@ehfd great work. Thanks for sharing your research. Very interesting stuff. (I'm a UCSD alum, so it's great to see my university affiliated with this work). I'm excited to see how the webRTC performs in my use cases and I'm glad to see so much progress in the development of this idea.
I'm not sure I fully understand the point about single X session per GPU, but I can report that we have been able to get 20 running X instances on a single T4 using
nvidia-docker
as the runtime in our kubernetes cluster (This was just a simple test to get some understanding of possibilities - in this case, we were actually memory bound on the VM and for these vanilla X sessions doing nothing, we could prob have put more on the T4 - we will do some slightly more demanding dimensioning experiments and can put a few notes on this thread).
Might be a difference in behavior with Datacenter GPUs and Consumer GPUs. Very welcome to hear more about it, and this could be a great bonus. Please extend this in #11.
Well, what's weird is that it works when I use
docker run
. It just doesn't work when I'm usingdocker compose
, so I'm not sure what about the driver installation is different between those two commands. Seems like there might be an implicit configuration being set differently between the two commands? Or a different level of privilege or access set by default?
@johncadengo So, was this issue resolved?
@ehfd yes, this issue had to do with the nvidia driver. Thanks for helping me along with it. I might be needing some more help, but I will create another issue for it after I've troubleshooted.
@ehfd yes, this issue had to do with the nvidia driver. Thanks for helping me along with it. I might be needing some more help, but I will create another issue for it after I've troubleshooted.
Closing for now then.
Further note from this issue, capabilities: [gpu, utility]
is not enough, either capabilities: all
should be set or the inclusion of graphics
and display
is required.
Added in Documentation.
I'm trying to convert your example of using a docker run command
docker run --gpus 1 -it -e TZ=UTC -e SIZEW=1920 -e SIZEH=1080 -e SHARED=TRUE -e PASSWD=mypasswd -e VIDEO_PORT=DFP -p 8080:8080 ehfd/nvidia-glx-desktop:latest
into a docker compose file.Here is my file:
When I run it with your command,
xrandr
returns the correct value, mimicking a screen. However, when I run it with docker compose,xrandr
is a virtual screen, at something like 32000 x 32000 screen size. Is there some subtle difference I'm not understanding?