Open alexleach opened 2 months ago
Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.
I can only confirm what I can test, I run Debian Bookworm and have tried on it and Fedora 40.
Here is a test Run:
docker run --rm -it \
--shm-size=1gb \
--runtime nvidia \
--gpus all -p 3000:3000 \
linuxserver/blender bash
This is with a 3060 and I have full CUDA support and accelerated preview rendering running the 525.147.05
drivers.
Interesting, thanks for this. When I run your exact command on my machine, which has very similar hardware(!), I don't have CUDA or OptiX support...
However, it does (evidently) launch blender, with just one seemingly benign error message shown in the console:
**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation
root@f36e0429e7f1:/# Obt-Message: Xinerama extension is not present on the server
2024-05-30 12:03:24,123 [INFO] websocket 0: got client connection from 127.0.0.1
2024-05-30 12:03:24,130 [PRIO] Connections: accepted: @192.168.1.5_1717070604.123549::websocket
In fact, when using docker run [...]
, blender always starts, so I don't get the black screen or error message about zine at all, even after adding pretty much every flag I know of that corresponds to my compose file:
docker run --rm -it \
--shm-size=1gb \
--runtime nvidia \
--gpus all \
-p 3000:3000 \
-e NVIDIA_VISIBLE_DEVICES=all \
-v /dev/nvidia0:/dev/nvidia0 \
-v /dev/nvidiactl:/dev/nvidiactl \
-v /dev/nvidia-modeset:/dev/nvidia-modeset \
-v /dev/nvidia-uvm:/dev/nvidia-uvm \
-v /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
-v /media/data/blender-cache:/var/cache/blender \
-v /tmp/.X11-unix:/tmp/.X11-unix \
linuxserver/blender bash
I did note that when using docker run
with -e PGID=1003
(where group 1003 is my host's vglusers group), blender segfaults when I open Edit > Preferences, with the console output showing:
**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.
Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation
root@7f9776a6aa64:/# Obt-Message: Xinerama extension is not present on the server
2024-05-30 13:36:21,210 [INFO] websocket 0: got client connection from 127.0.0.1
2024-05-30 13:36:21,218 [PRIO] Connections: accepted: @192.168.1.5_1717076181.210856::websocket
Writing: /tmp/blender.crash.txt
Segmentation fault (core dumped)
ERROR: openbox-xdg-autostart requires PyXDG to be installed
I also reproduced this with a minimal compose.yaml
file, producing the same segfault and console output:
services:
blender:
image: linuxserver/blender:latest
restart: unless-stopped
container_name: blender
environment:
- PGID=1003
runtime: nvidia
ports:
- 0.0.0.0:3000:3000/tcp
volumes:
- /media/data/blender-config:/config
- /media/data/blender-cache:/var/cache/blender
# Add all the GPU capabilities
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [compute, gpu, graphics, utility, video, display]
# Increase shared memory size
shm_size: '1gb'
I've just been playing with a bunch of additional combinations, and have noted the following. These are additive to the compose.yaml
, just above...
/tmp/.X11-unix:/tmp/.X11-unix
. CUDA not available.PGID=1003
(host's vglusers). Blender does not launch, black screen, originally reported console error.startwm.sh
, one without those environment variables set. Everything works...I've then worked backwards:
/tmp/.X11-unix:/tmp/.X11-unix
.. Hmm, everything still works.PGID=1003
. CUDA not available.PGID=1003
, and don't overwrite startwm.sh
. Black screen, blender doesn't start.In summary, I for some reason need to set the group to my host's vglusers
, and remove those environment variables...
So, I'm a bit confused about this if I'm honest, especially as you're basically on the same hardware architecture, with the main difference being that you're running Debian instead of Arch...
Can I ask you to share your nvidia-container-runtime configuration file? Mine is at /etc/nvidia-container-runtime/config.toml
? The contents of mine are below...
The other thing that crosses my mind is cgroups, maybe I should look into that again.
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = true
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
user = "root:root"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false
[nvidia-ctk]
path = "nvidia-ctk"
Sure, one quick thing when you say working keep in mind that yes for rendering it will use CUDA etc, but the actual onscreen preview of the model is rendered in OpenGL which is why we want to automatically inject the Zink override, otherwise all your rotations and previews use LLVMPipe and that is a GPU emulated on your CPU. It is night and day when you get it working.
Here is my runtime config:
#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false
[nvidia-ctk]
path = "nvidia-ctk"
Also I know Arch is more head but I run a backport kernel (6.6.13) and current nvidia runtime:
nvidia-container-cli --version
cli-version: 1.15.0
lib-version: 1.15.0
Maybe it is the root:root
perms for the device in your config ?
Thanks for that 🙂
I've had to go out, so am on my phone now. I did comment out that permissions line, rebooted and got a runtime permissions error... I also tried changing it to root:vglusers, but no joy there either, nor running as sudo.
Can I ask if you set up rootless mode?
The way I run my containers is by adding my user to the docker group, but I never set up rootless mode. I guess I should try that next?
Cheers, Alex
From: Ryan Kuba @.> Sent: Thursday, May 30, 2024 4:53:16 PM To: linuxserver/docker-blender @.> Cc: ALB.Leach @.>; Author @.> Subject: Re: [linuxserver/docker-blender] [BUG] Blender does not start with container (Issue #10)
Sure, one quick thing when you say working keep in mind that yes for rendering it will use CUDA etc, but the actual onscreen preview of the model is rendered in OpenGL which is why we want to automatically inject the Zink override, otherwise all your rotations and previews use LLVMPipe and that is a GPU emulated on your CPU. It is night and day when you get it working.
Here is my runtime config:
disable-require = false supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
[nvidia-container-cli]
environment = []
ldconfig = "@/sbin/ldconfig" load-kmods = true
[nvidia-container-runtime]
log-level = "info" mode = "auto" runtimes = ["docker-runc", "runc", "crun"]
[nvidia-container-runtime.modes]
[nvidia-container-runtime.modes.cdi] annotation-prefixes = ["cdi.k8s.io/"] default-kind = "nvidia.com/gpu" spec-dirs = ["/etc/cdi", "/var/run/cdi"]
[nvidia-container-runtime.modes.csv] mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"
[nvidia-container-runtime-hook] path = "nvidia-container-runtime-hook" skip-mode-detection = false
[nvidia-ctk] path = "nvidia-ctk"
— Reply to this email directly, view it on GitHubhttps://github.com/linuxserver/docker-blender/issues/10#issuecomment-2140033293, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAC3ZLNNJELOSIY5FZQVRVLZE5DOZAVCNFSM6AAAAABIQKSJ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQGAZTGMRZGM. You are receiving this because you authored the thread.Message ID: @.***>
No we use S6v3 init and require root for all out images. Everything runs in userspace as the abc user in the container, but we have hooks that run on init that require root in the container like chowning the video device.
Does this work? (given your error)
https://github.com/slint-ui/slint/issues/4828#issuecomment-1992596539
Hi, yeah I've already got those nvidia_drm kernel mode options set actually, which I got from the arch wiki, at https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting
I had a brief look at S6 a couple of days ago. So you run your containers as root then?
From: Ryan Kuba @.> Sent: Thursday, May 30, 2024 8:12:15 PM To: linuxserver/docker-blender @.> Cc: ALB.Leach @.>; Author @.> Subject: Re: [linuxserver/docker-blender] [BUG] Blender does not start with container (Issue #10)
Does this work? (given your error)
slint-ui/slint#4828 (comment)https://github.com/slint-ui/slint/issues/4828#issuecomment-1992596539
— Reply to this email directly, view it on GitHubhttps://github.com/linuxserver/docker-blender/issues/10#issuecomment-2140711952, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAC3ZLJCXCH2MORSRYXZ7I3ZE52Y7AVCNFSM6AAAAABIQKSJ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQG4YTCOJVGI. You are receiving this because you authored the thread.Message ID: @.***>
This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.
Is there an existing issue for this?
Current Behavior
I just updated the container image to the latest tag (currently 4.1.1), and blender does not launch. It appears to be something to do with X trying to use MESA and zink?
When accessing the web server, KasmVNC loads fine, but I just get a black screen. Running
ps
in the container shows thatblender
is not launched. However, if I launch blender manually within the container, then it shows up in my web browser.Expected Behavior
Blender should start with the container.
Steps To Reproduce
I have an NVIDIA GPU and am running with
docker compose
, using the nvidia-container-runtime. I also extend the base image by installingnvidia-cuda-toolkit
. This has worked fine for several months, allowing me to render on my RTX 3070 graphics card.However, since updating to the latest image, version 4.1.1, bringing up the container won't bring up blender...
I found that I can fix the behaviour by first installing
python3-xdg
and then editing/defaults/startwm.sh
and commenting out the environment variables which are being set. I then searched for what commit and what repository added these environment variables tostartwm.sh
...So, it was in the
docker-baseimage-kasmvnc
repository where these environment variables were added... However, on the same day that a commit added these environment variables (https://github.com/linuxserver/docker-baseimage-kasmvnc/commit/421ff46d876f2a7ff15fc6bea91accb4537ff958), they were removed in a later commit (https://github.com/linuxserver/docker-baseimage-kasmvnc/commit/c8a520dea990966c0e10be0b8307d197420cfe3c).So, perhaps the bug should be reported there, but the thing is, they've already fixed it... Just maybe not in a release? I've not quite figured that part out, but either way, please can you update your latest release to include that commit?
Environment
CPU architecture
x86-64
Docker creation
Container logs