selkies-project / docker-nvidia-glx-desktop

KDE Plasma Desktop container designed for Kubernetes, supporting OpenGL EGL and GLX, Vulkan, and Wine/Proton for NVIDIA GPUs through WebRTC and HTML5, providing an open-source remote cloud/HPC graphics or game streaming platform.
https://github.com/selkies-project/docker-nvidia-glx-desktop/pkgs/container/nvidia-glx-desktop
Mozilla Public License 2.0
298 stars 65 forks source link

Approach for using with Apptainer/Singularity in GLX Desktop #58

Open DevinBayly opened 1 month ago

DevinBayly commented 1 month ago

Hi there,

I'm a data visualization consultant at a High Performance Computing center, and we don't have configurations for hardware accelerated displays. This limits the kinds of scientific visualization we can do on the systems, and a tool like yours would be a major breakthrough for my work if I was able to use it. The issue is that we don't support Docker, and use singularity instead.

I recently tried a naive conversion of your container but wasn't fully able to make it work. Do you have any ideas about what would need to change in order to use this container format?

Happy to provide error logs if this is a direction you'd be up for following.

Otherwise, thanks for all your hard work already and I'll keep it in mind when recommending solutions off our system.

ehfd commented 1 month ago

Hi!

The EGL desktop has current instructions for Apptainer that work with only --nv. The GLX desktop would require --nvccli even if it is eventually supported, and it currently has hiccups.

Please try the instructions specified in https://github.com/selkies-project/docker-nvidia-egl-desktop for now and see if it accommodates your needs.

ehfd commented 1 month ago

So happy that the UArizona HPC group is in touch with us :)

Yesterday, I talked with the research computing team at Waterloo.

DevinBayly commented 1 month ago

After posting this I noticed you had an egl container too which got me pretty psyched. The egl option has helped me be able to run things like paraview in the past so I have good associations.

I tried it out and when launching the /usr/bin/supervisord ( I read that this was the entry point so figured id try to launch it manually ) program I got a message that indicated I'd need to be root to make it work. I'll definitely make an issue on that repo next and if you have a chance to take a look it would be great to troubleshoot with you.

Yay for open source projects in research computing, your tools make our lives so much better.

DevinBayly commented 1 month ago

One of the differences I had to make from your instructions was to build your container with --sandbox since my center apparently doesn't support the overlay creation command used in your example. Hopefully the main point is just to have a non read-only file system for launching the program in.

ehfd commented 1 month ago

I got a message that indicated I'd need to be root to make it work

This should not be the case; strange. I could take a look at that message.

One of the differences I had to make from your instructions was to build your container with --sandbox since my center apparently doesn't support the overlay creation command used in your example. Hopefully the main point is just to have a non read-only file system for launching the program in.

Yes, it needs a non read-only filesystem. It would be helpful to see the command you were successful in.

Everyone has a different environment and I'd like to fit everything into the most restrictive one so that compatibility will be maximized.

DevinBayly commented 1 month ago

Ok, just got back up (time zones might have us playing telephone on this one). I'll include the information here, but will also make a post on the discord channel since it sounds like that's the best spot for non technical support.

$singularity shell --nv nvidia-egl-desktop/
INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
Apptainer> whoami
baylyd
Apptainer> /usr/bin/supervisord
Error: Can't drop privilege as nonroot user
For help, use /usr/bin/supervisord -h
Apptainer> 

I built the container with

singularity build --sandbox nvidia-egl-desktop docker://ghcr.io/selkies-project/nvidia-egl-desktop:latest

since my system seems like it has an older version of mkfs.ext3

$singularity overlay create --sparse --size 1536 "test_egl_desktop"
FATAL:   mkfs.ext3 seems too old as it doesn't support -d, this is required to create the overlay layout
The time that I launched it successfully was with the `-f` flag which is the fake root setting 

This is the command that successfully launches supervisord but you can see it runs into errors and fails to reach a point where the process on localhost:8080 gets started.

$singularity shell -f --nv nvidia-egl-desktop/
INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
INFO:    Using fakeroot command combined with root-mapped namespace
Apptainer> /usr/bin/supervisord
/usr/lib/python3/dist-packages/supervisor/options.py:474: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
  self.warnings.warn(
2024-08-02 06:38:55,925 WARN No file matches via include "/etc/supervisor/conf.d/*.conf"
2024-08-02 06:38:55,925 INFO Set uid to user 1000 succeeded
2024-08-02 06:38:55,936 INFO RPC interface 'supervisor' initialized
2024-08-02 06:38:55,936 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-08-02 06:38:55,936 INFO supervisord started with pid 13876
2024-08-02 06:38:56,938 INFO spawned: 'dbus' with pid 13882
2024-08-02 06:38:56,939 INFO spawned: 'entrypoint' with pid 13883
2024-08-02 06:38:56,941 INFO spawned: 'kasmvnc' with pid 13884
2024-08-02 06:38:56,942 INFO spawned: 'selkies-gstreamer' with pid 13885
2024-08-02 06:38:56,943 INFO spawned: 'nginx' with pid 13886
2024-08-02 06:38:56,945 INFO spawned: 'pipewire' with pid 13887
2024-08-02 06:38:56,946 INFO spawned: 'pipewire-pulse' with pid 13888
2024-08-02 06:38:56,948 INFO spawned: 'wireplumber' with pid 13889
2024-08-02 06:38:57,950 INFO success: dbus entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: entrypoint entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: kasmvnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: selkies-gstreamer entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: pipewire entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: pipewire-pulse entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:57,950 INFO success: wireplumber entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-02 06:38:58,203 WARN exited: dbus (exit status 1; not expected)
2024-08-02 06:38:58,364 INFO spawned: 'dbus' with pid 13940
2024-08-02 06:38:58,558 WARN exited: dbus (exit status 1; not expected)
2024-08-02 06:38:59,780 INFO spawned: 'dbus' with pid 14137
2024-08-02 06:38:59,826 WARN exited: dbus (exit status 1; not expected)
^C2024-08-02 06:39:01,830 INFO spawned: 'dbus' with pid 14175
2024-08-02 06:39:01,830 WARN received SIGINT indicating exit request
2024-08-02 06:39:01,830 INFO waiting for dbus, entrypoint, kasmvnc, selkies-gstreamer, nginx, pipewire, pipewire-pulse, wireplumber to die
2024-08-02 06:39:01,831 WARN stopped: wireplumber (terminated by SIGINT)
2024-08-02 06:39:01,831 WARN stopped: pipewire-pulse (terminated by SIGINT)
2024-08-02 06:39:01,832 WARN stopped: pipewire (terminated by SIGINT)
2024-08-02 06:39:01,832 WARN stopped: nginx (terminated by SIGINT)
2024-08-02 06:39:01,916 WARN stopped: selkies-gstreamer (terminated by SIGINT)
2024-08-02 06:39:01,916 WARN exited: dbus (exit status 1; not expected)
2024-08-02 06:39:01,917 WARN stopped: kasmvnc (terminated by SIGINT)
2024-08-02 06:39:04,881 INFO waiting for dbus, entrypoint to die
2024-08-02 06:39:07,885 INFO waiting for dbus, entrypoint to die

2024-08-02 06:39:10,887 INFO waiting for dbus, entrypoint to die
2024-08-02 06:39:12,890 WARN killing 'entrypoint' (13883) with SIGKILL
2024-08-02 06:39:12,890 WARN stopped: entrypoint (terminated by SIGKILL)
Apptainer> 
Apptainer> 

I think perhaps even the sandboxed container might still not be perfect since the --writable flag means that I'm no longer able to run basic nvidia things like nvidia-smi so I might need to figure out how to still make an overlay that your container is happy using (https://apptainer.org/docs/user/main/persistent_overlays.html) and that might be the next best course of action?

ehfd commented 1 month ago

Note that persistent overlays have a bug that prevent correct functionalities over 2GB (recently fixed for the next Apptainer release). Otherwise, I guess it's fine.

ehfd commented 1 month ago

https://github.com/apptainer/apptainer/issues/2398

Note that GLX Desktop support is dependent on this issue, and the existence of the NVIDIA container toolkit in the cluster.