linuxserver / docker-blender

Responsive web accessible Blender Docker container with hardware acceleration.
GNU General Public License v3.0
55 stars 14 forks source link

[BUG] Blender does not start with container #10

Open alexleach opened 2 months ago

alexleach commented 2 months ago

Is there an existing issue for this?

Current Behavior

I just updated the container image to the latest tag (currently 4.1.1), and blender does not launch. It appears to be something to do with X trying to use MESA and zink?

When accessing the web server, KasmVNC loads fine, but I just get a black screen. Running ps in the container shows that blender is not launched. However, if I launch blender manually within the container, then it shows up in my web browser.

Expected Behavior

Blender should start with the container.

Steps To Reproduce

I have an NVIDIA GPU and am running with docker compose, using the nvidia-container-runtime. I also extend the base image by installing nvidia-cuda-toolkit. This has worked fine for several months, allowing me to render on my RTX 3070 graphics card.

However, since updating to the latest image, version 4.1.1, bringing up the container won't bring up blender...

I found that I can fix the behaviour by first installing python3-xdg and then editing /defaults/startwm.sh and commenting out the environment variables which are being set. I then searched for what commit and what repository added these environment variables to startwm.sh...

So, it was in the docker-baseimage-kasmvnc repository where these environment variables were added... However, on the same day that a commit added these environment variables (https://github.com/linuxserver/docker-baseimage-kasmvnc/commit/421ff46d876f2a7ff15fc6bea91accb4537ff958), they were removed in a later commit (https://github.com/linuxserver/docker-baseimage-kasmvnc/commit/c8a520dea990966c0e10be0b8307d197420cfe3c).

So, perhaps the bug should be reported there, but the thing is, they've already fixed it... Just maybe not in a release? I've not quite figured that part out, but either way, please can you update your latest release to include that commit?

Environment

- OS: Arch Linux, with `nvidia-open-dkms` drivers.
- How docker service was installed: pacman -S docker docker-compose

CPU architecture

x86-64

Docker creation

`docker compose up -d`

My compose.yaml:-

services:                                                                                                                                                                                                             
  blender:                                                                                                                                                                                                            
    image: local/blender:latest                                                                                                                                                                                       
    build:                                                                                                                                                                                                            
      context: .                                                                                                                                                                                                      
      dockerfile_inline: |                                                                                                                                                                                            
        FROM linuxserver/blender:latest                                                                                                                                                                               
        RUN apt-get update && \                                                                                                                                                                                       
          apt-get install --no-install-recommends -y nvidia-cuda-toolkit python3-xdg && \                                                                                                                             
          rm -rf /var/lib/apt/lists/*                                                                                                                                                                                 

      tags:                                                                                                                                                                                                           
        - local/blender:latest                                                                                                                                                                                        

    restart: unless-stopped                                                                                                                                                                                           
    container_name: blender                                                                                                                                                                                           
    environment:                                                                                                                                                                                                      
      - NVIDIA_VISIBLE_DEVICES=all                                                                                                                                                                                    
      # Run as the host vglusers group (1003)                                                                                                                                                                         
      - PGID=1003                                                                                                                                                                                                     

    runtime: nvidia                                                                                                                                                                                                   

    volumes:                                                                                                                                                                                                          
      # Pass-through support for nvidia GPU                                                                                                                                                                           
      - "/dev/nvidia0:/dev/nvidia0"                                                                                                                                                                                   
      - "/dev/nvidiactl:/dev/nvidiactl"                                                                                                                                                                               
      - "/dev/nvidia-modeset:/dev/nvidia-modeset"                                                                                                                                                                     
      - "/dev/nvidia-uvm:/dev/nvidia-uvm"                                                                                                                                                                             
      - "/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools"                                                                                                                                                                 

      # For passing through host X11 server. Does it help?                                                                                                                                                            
      - /tmp/.X11-unix:/tmp/.X11-unix                                                                                                                                                                                 

      - /media/data/blender-config:/config                                                                                                                                                                            
      - /media/data/blender-cache:/var/cache/blender                                                                                                                                                                  

    # Add all the GPU capabilities                                                                                                                                                                                    
    deploy:                                                                                                                                                                                                           
      resources:                                                                                                                                                                                                      
        reservations:                                                                                                                                                                                                 
          devices:                                                                                                                                                                                                    
          - driver: nvidia                                                                                                                                                                                            
            count: all                                                                                                                                                                                                
            capabilities: [gpu, compute, utility, graphics]

Container logs

The container logs show the following:

blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  |       ██╗     ███████╗██╗ ██████╗                                                                                                                                                                          
blender  |       ██║     ██╔════╝██║██╔═══██╗                                                                                                                                                                         
blender  |       ██║     ███████╗██║██║   ██║                                                                                                                                                                         
blender  |       ██║     ╚════██║██║██║   ██║                                                                                                                                                                         
blender  |       ███████╗███████║██║╚██████╔╝                                                                                                                                                                         
blender  |       ╚══════╝╚══════╝╚═╝ ╚═════╝                                                                                                                                                                          
blender  |                                                                                                                                                                                                            
blender  |    Brought to you by linuxserver.io                                                                                                                                                                        
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | To support LSIO projects visit:                                                                                                                                                                            
blender  | https://www.linuxserver.io/donate/                                                                                                                                                                         
blender  |                                                                                                                                                                                                            
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  | GID/UID                                                                                                                                                                                                    
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | User UID:    911                                                                                                                                                                                           
blender  | User GID:    1003                                                                                                                                                                                          
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | **** permissions for /dev/dri/card1 are good ****                                                                                                                                                          
blender  | **** permissions for /dev/dri/renderD128 are good ****                                                                                                                                                     
blender  | [custom-init] No custom files found, skipping...
blender  | /usr/bin/nvidia-smi
blender  | /usr/bin/nvidia-smi
blender  | 
blender  | Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
blender  | Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
blender  | See http://kasmweb.com for information on KasmVNC.
blender  | Underlying X server release 12014000, The X.Org Foundation
blender  | 
blender  | [ls.io-init] done.
blender  | Obt-Message: Xinerama extension is not present on the server
blender  | MESA: error: zink: could not create swapchain
blender  | X Error of failed request:  GLXBadCurrentWindow
blender  |   Major opcode of failed request:  149 (GLX)
blender  |   Minor opcode of failed request:  11 (X_GLXSwapBuffers)
blender  |   Serial number of failed request:  175
blender  |   Current serial number in output stream:  175
blender  | Read prefs: "/config/.config/blender/4.1/config/userpref.blend"
github-actions[bot] commented 2 months ago

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

thelamer commented 2 months ago

I can only confirm what I can test, I run Debian Bookworm and have tried on it and Fedora 40.

Here is a test Run:

docker run --rm -it \
 --shm-size=1gb \
 --runtime nvidia \
 --gpus all -p 3000:3000 \
 linuxserver/blender bash

This is with a 3060 and I have full CUDA support and accelerated preview rendering running the 525.147.05 drivers. cuda

alexleach commented 2 months ago

Interesting, thanks for this. When I run your exact command on my machine, which has very similar hardware(!), I don't have CUDA or OptiX support...

Screenshot 2024-05-30 at 13 03 54

However, it does (evidently) launch blender, with just one seemingly benign error message shown in the console:

**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.

Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation

root@f36e0429e7f1:/# Obt-Message: Xinerama extension is not present on the server
 2024-05-30 12:03:24,123 [INFO] websocket 0: got client connection from 127.0.0.1
 2024-05-30 12:03:24,130 [PRIO] Connections: accepted: @192.168.1.5_1717070604.123549::websocket

In fact, when using docker run [...], blender always starts, so I don't get the black screen or error message about zine at all, even after adding pretty much every flag I know of that corresponds to my compose file:

docker run --rm -it \
  --shm-size=1gb \
  --runtime nvidia \
  --gpus all \
  -p 3000:3000 \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -v /dev/nvidia0:/dev/nvidia0 \
  -v /dev/nvidiactl:/dev/nvidiactl \
  -v /dev/nvidia-modeset:/dev/nvidia-modeset \
  -v /dev/nvidia-uvm:/dev/nvidia-uvm \
  -v /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
  -v /media/data/blender-cache:/var/cache/blender \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  linuxserver/blender bash

I did note that when using docker run with -e PGID=1003 (where group 1003 is my host's vglusers group), blender segfaults when I open Edit > Preferences, with the console output showing:

**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.

Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation

root@7f9776a6aa64:/# Obt-Message: Xinerama extension is not present on the server
 2024-05-30 13:36:21,210 [INFO] websocket 0: got client connection from 127.0.0.1
 2024-05-30 13:36:21,218 [PRIO] Connections: accepted: @192.168.1.5_1717076181.210856::websocket
Writing: /tmp/blender.crash.txt
Segmentation fault (core dumped)

ERROR: openbox-xdg-autostart requires PyXDG to be installed

I also reproduced this with a minimal compose.yaml file, producing the same segfault and console output:

services:
  blender:
    image: linuxserver/blender:latest

    restart: unless-stopped

    container_name: blender
    environment:
      - PGID=1003

    runtime: nvidia

    ports:
      - 0.0.0.0:3000:3000/tcp

    volumes:
      - /media/data/blender-config:/config
      - /media/data/blender-cache:/var/cache/blender

    # Add all the GPU capabilities
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [compute, gpu, graphics, utility, video, display]

    # Increase shared memory size
    shm_size: '1gb'

I've just been playing with a bunch of additional combinations, and have noted the following. These are additive to the compose.yaml, just above...

Screenshot 2024-05-30 at 13 40 40

I've then worked backwards:

In summary, I for some reason need to set the group to my host's vglusers, and remove those environment variables...

So, I'm a bit confused about this if I'm honest, especially as you're basically on the same hardware architecture, with the main difference being that you're running Debian instead of Arch...

Can I ask you to share your nvidia-container-runtime configuration file? Mine is at /etc/nvidia-container-runtime/config.toml? The contents of mine are below...

The other thing that crosses my mind is cgroups, maybe I should look into that again.

#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = true
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
user = "root:root"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"
thelamer commented 2 months ago

Sure, one quick thing when you say working keep in mind that yes for rendering it will use CUDA etc, but the actual onscreen preview of the model is rendered in OpenGL which is why we want to automatically inject the Zink override, otherwise all your rotations and previews use LLVMPipe and that is a GPU emulated on your CPU. It is night and day when you get it working.

Here is my runtime config:

#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"

Also I know Arch is more head but I run a backport kernel (6.6.13) and current nvidia runtime:

nvidia-container-cli --version
cli-version: 1.15.0
lib-version: 1.15.0

Maybe it is the root:root perms for the device in your config ?

alexleach commented 2 months ago

Thanks for that 🙂

I've had to go out, so am on my phone now. I did comment out that permissions line, rebooted and got a runtime permissions error... I also tried changing it to root:vglusers, but no joy there either, nor running as sudo.

Can I ask if you set up rootless mode?

The way I run my containers is by adding my user to the docker group, but I never set up rootless mode. I guess I should try that next?

Cheers, Alex


From: Ryan Kuba @.> Sent: Thursday, May 30, 2024 4:53:16 PM To: linuxserver/docker-blender @.> Cc: ALB.Leach @.>; Author @.> Subject: Re: [linuxserver/docker-blender] [BUG] Blender does not start with container (Issue #10)

Sure, one quick thing when you say working keep in mind that yes for rendering it will use CUDA etc, but the actual onscreen preview of the model is rendered in OpenGL which is why we want to automatically inject the Zink override, otherwise all your rotations and previews use LLVMPipe and that is a GPU emulated on your CPU. It is night and day when you get it working.

Here is my runtime config:

accept-nvidia-visible-devices-as-volume-mounts = false

accept-nvidia-visible-devices-envvar-when-unprivileged = true

disable-require = false supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"

swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]

debug = "/var/log/nvidia-container-toolkit.log"

environment = []

ldcache = "/etc/ld.so.cache"

ldconfig = "@/sbin/ldconfig" load-kmods = true

no-cgroups = false

path = "/usr/bin/nvidia-container-cli"

root = "/run/nvidia/driver"

user = "root:video"

[nvidia-container-runtime]

debug = "/var/log/nvidia-container-runtime.log"

log-level = "info" mode = "auto" runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi] annotation-prefixes = ["cdi.k8s.io/"] default-kind = "nvidia.com/gpu" spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv] mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook] path = "nvidia-container-runtime-hook" skip-mode-detection = false

[nvidia-ctk] path = "nvidia-ctk"

— Reply to this email directly, view it on GitHubhttps://github.com/linuxserver/docker-blender/issues/10#issuecomment-2140033293, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAC3ZLNNJELOSIY5FZQVRVLZE5DOZAVCNFSM6AAAAABIQKSJ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQGAZTGMRZGM. You are receiving this because you authored the thread.Message ID: @.***>

thelamer commented 2 months ago

No we use S6v3 init and require root for all out images. Everything runs in userspace as the abc user in the container, but we have hooks that run on init that require root in the container like chowning the video device.

thelamer commented 2 months ago

Does this work? (given your error)

https://github.com/slint-ui/slint/issues/4828#issuecomment-1992596539

alexleach commented 2 months ago

Hi, yeah I've already got those nvidia_drm kernel mode options set actually, which I got from the arch wiki, at https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting

I had a brief look at S6 a couple of days ago. So you run your containers as root then?


From: Ryan Kuba @.> Sent: Thursday, May 30, 2024 8:12:15 PM To: linuxserver/docker-blender @.> Cc: ALB.Leach @.>; Author @.> Subject: Re: [linuxserver/docker-blender] [BUG] Blender does not start with container (Issue #10)

Does this work? (given your error)

slint-ui/slint#4828 (comment)https://github.com/slint-ui/slint/issues/4828#issuecomment-1992596539

— Reply to this email directly, view it on GitHubhttps://github.com/linuxserver/docker-blender/issues/10#issuecomment-2140711952, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAC3ZLJCXCH2MORSRYXZ7I3ZE52Y7AVCNFSM6AAAAABIQKSJ7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBQG4YTCOJVGI. You are receiving this because you authored the thread.Message ID: @.***>

LinuxServer-CI commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.