netblue30 / firejail

Linux namespaces and seccomp-bpf sandbox
https://firejail.wordpress.com
GNU General Public License v2.0
5.75k stars 563 forks source link

Torch cuda not working with firejail --noprofile #5623

Open achsvg opened 1 year ago

achsvg commented 1 year ago

Description

I have installed torch 1.13.1 with cuda support on a GPU machine. The following command fails:

firejail --noprofile -- python -c "import torch; print(torch.cuda.is_available())"
Parent pid 3594, child pid 3595
Child process initialized in 2.08 ms
/xxx/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

Parent is shutting down, bye...

Whereas running it outside of firejail gives:

python -c "import torch; print(torch.cuda.is_available())"
True

Strange thing is that once it has ran outside of firejail once, it will succeed in firejail thereafter. I'm guessing it has to do with lazy initialization of CUDA in Torch. Is there any configuration I can make to make it work in firejail without priorly having to run it outside ?

Steps to Reproduce

All described above

Expected behavior

firejail --noprofile -- python -c "import torch; print(torch.cuda.is_available())" should return True

Actual behavior

It returns False with error message

/xxx/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0

Behavior without a profile

as described in the description.

Additional context

Environment

Checklist

Log

Output of LC_ALL=C firejail /path/to/program

``` Reading profile /etc/firejail/default.profile Reading profile /etc/firejail/disable-common.inc Reading profile /etc/firejail/disable-passwdmgr.inc Reading profile /etc/firejail/disable-programs.inc Warning: networking feature is disabled in Firejail configuration file ** Note: you can use --noprofile to disable default.profile ** Parent pid 1470, child pid 1471 Child process initialized in 36.45 ms /xxx/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 False ```

Output of LC_ALL=C firejail --debug /path/to/program

``` Autoselecting /bin/bash as shell Building quoted command line: 'python' '-c' 'import torch; print(torch.cuda.is_available())' Command name #python# Attempting to find default.profile... Found default.profile profile in /etc/firejail directory Reading profile /etc/firejail/default.profile Found disable-common.inc profile in /etc/firejail directory Reading profile /etc/firejail/disable-common.inc Found disable-passwdmgr.inc profile in /etc/firejail directory Reading profile /etc/firejail/disable-passwdmgr.inc Found disable-programs.inc profile in /etc/firejail directory Reading profile /etc/firejail/disable-programs.inc Warning: networking feature is disabled in Firejail configuration file ** Note: you can use --noprofile to disable default.profile ** DISPLAY is not set Using the local network stack Parent pid 1486, child pid 1487 Initializing child process Host network configured PID namespace installed Mounting tmpfs on /run/firejail/mnt directory Creating empty /run/firejail/mnt/seccomp directory Creating empty /run/firejail/mnt/seccomp/seccomp.protocol file Creating empty /run/firejail/mnt/seccomp/seccomp.postexec file Build protocol filter: unix,inet,inet6 sbox run: /run/firejail/lib/fseccomp protocol build unix,inet,inet6 /run/firejail/mnt/seccomp/seccomp.protocol (null) Dropping all capabilities Drop privileges: pid 2, uid 1000, gid 1000, nogroups 1 No supplementary groups Mounting /proc filesystem representing the PID namespace Basic read-only filesystem: Mounting read-only /etc Mounting noexec /etc Mounting read-only /var Mounting noexec /var Mounting read-only /bin Mounting read-only /sbin Mounting read-only /lib Mounting read-only /lib64 Mounting read-only /lib32 Mounting read-only /libx32 Mounting read-only /usr Mounting tmpfs on /var/lock Mounting tmpfs on /var/tmp Mounting tmpfs on /var/log Mounting tmpfs on /var/lib/dhcp Mounting tmpfs on /var/lib/sudo Create the new utmp file Mount the new utmp file Cleaning /home directory Cleaning /run/user directory Sanitizing /etc/passwd, UID_MIN 1000 Sanitizing /etc/group, GID_MIN 1000 Disable /run/firejail/network Disable /run/firejail/bandwidth Disable /run/firejail/name Disable /run/firejail/profile Disable /run/firejail/x11 Mounting read-only /proc/sys Remounting /sys directory Disable /sys/firmware Disable /sys/hypervisor Disable /sys/power Disable /sys/kernel/debug Disable /sys/kernel/vmcoreinfo Disable /sys/kernel/uevent_helper Disable /proc/sys/fs/binfmt_misc Disable /proc/sys/kernel/core_pattern Disable /proc/sys/kernel/modprobe Disable /proc/sysrq-trigger Disable /proc/sys/kernel/hotplug Disable /proc/sys/vm/panic_on_oom Disable /proc/irq Disable /proc/bus Disable /proc/timer_list Disable /proc/kcore Disable /proc/kallsyms Disable /usr/lib/modules (requested /lib/modules) Disable /boot Disable /dev/port Disable /run/user/1000/gnupg Disable /run/user/1000/systemd Disable /dev/kmsg Disable /proc/kmsg Disable /home/ubuntu/.bash_history Disable /etc/X11/Xsession.d Disable /etc/xdg/autostart Disable /var/lib/systemd Disable /var/cache/apt Disable /var/lib/apt Disable /var/lib/dkms Disable /var/mail Disable /var/opt Disable /run/acpid.socket (requested /var/run/acpid.socket) Disable /run/docker.sock (requested /var/run/docker.sock) Disable /run/rpcbind.sock (requested /var/run/rpcbind.sock) Disable /var/spool/cron Disable /var/mail (requested /var/spool/mail) Disable /etc/cron.weekly Disable /etc/cron.hourly Disable /etc/crontab Disable /etc/cron.d Disable /etc/cron.monthly Disable /etc/cron.daily Disable /etc/profile.d Disable /etc/rc6.d Disable /etc/rc3.d Disable /etc/rcS.d Disable /etc/rc2.d Disable /etc/rc5.d Disable /etc/rc4.d Disable /etc/rc1.d Disable /etc/rc0.d Disable /etc/kernel-img.conf Disable /etc/kernel Disable /etc/grub.d Disable /etc/dkms Disable /etc/apparmor Disable /etc/apparmor.d Disable /etc/selinux Disable /etc/modules Disable /etc/modules-load.d Disable /etc/logrotate.d Disable /etc/logrotate.conf Disable /etc/adduser.conf Mounting read-only /home/ubuntu/.bash_logout 918 861 259:2 /home/ubuntu/.bash_logout /home/ubuntu/.bash_logout ro,relatime master:1 - ext4 /dev/root rw,discard mountid=918 fsname=/home/ubuntu/.bash_logout dir=/home/ubuntu/.bash_logout fstype=ext4 Mounting read-only /home/ubuntu/.bashrc 919 861 259:2 /home/ubuntu/.bashrc /home/ubuntu/.bashrc ro,relatime master:1 - ext4 /dev/root rw,discard mountid=919 fsname=/home/ubuntu/.bashrc dir=/home/ubuntu/.bashrc fstype=ext4 Mounting read-only /home/ubuntu/.profile 920 861 259:2 /home/ubuntu/.profile /home/ubuntu/.profile ro,relatime master:1 - ext4 /dev/root rw,discard mountid=920 fsname=/home/ubuntu/.profile dir=/home/ubuntu/.profile fstype=ext4 Mounting read-only /home/ubuntu/.ssh/authorized_keys 921 861 259:2 /home/ubuntu/.ssh/authorized_keys /home/ubuntu/.ssh/authorized_keys ro,relatime master:1 - ext4 /dev/root rw,discard mountid=921 fsname=/home/ubuntu/.ssh/authorized_keys dir=/home/ubuntu/.ssh/authorized_keys fstype=ext4 Disable /home/ubuntu/.ssh Disable /etc/group- Disable /etc/gshadow Disable /etc/gshadow- Disable /etc/passwd- Disable /etc/shadow Disable /etc/shadow- Disable /etc/ssh Disable /usr/sbin (requested /sbin) Disable /usr/local/sbin Disable /usr/sbin Disable /usr/bin/at Disable /usr/bin/at (requested /bin/at) Disable /usr/bin/chage Disable /usr/bin/chage (requested /bin/chage) Disable /usr/bin/chfn Disable /usr/bin/chfn (requested /bin/chfn) Disable /usr/bin/chsh Disable /usr/bin/chsh (requested /bin/chsh) Disable /usr/bin/crontab Disable /usr/bin/crontab (requested /bin/crontab) Disable /usr/bin/expiry Disable /usr/bin/expiry (requested /bin/expiry) Disable /usr/bin/fusermount Disable /usr/bin/fusermount (requested /bin/fusermount) Disable /usr/bin/gpasswd Disable /usr/bin/gpasswd (requested /bin/gpasswd) Disable /usr/bin/mount Disable /usr/bin/mount (requested /bin/mount) Disable /usr/bin/nc.openbsd (requested /usr/bin/nc) Disable /usr/bin/nc.openbsd (requested /bin/nc) Disable /usr/bin/newgrp Disable /usr/bin/newgrp (requested /bin/newgrp) Disable /usr/bin/ntfs-3g Disable /usr/bin/ntfs-3g (requested /bin/ntfs-3g) Disable /usr/bin/pkexec Disable /usr/bin/pkexec (requested /bin/pkexec) Disable /usr/bin/newgrp (requested /usr/bin/sg) Disable /usr/bin/newgrp (requested /bin/sg) Disable /usr/bin/strace Disable /usr/bin/strace (requested /bin/strace) Disable /usr/bin/su Disable /usr/bin/su (requested /bin/su) Disable /usr/bin/sudo Disable /usr/bin/sudo (requested /bin/sudo) Disable /usr/bin/umount Disable /usr/bin/umount (requested /bin/umount) Disable /usr/bin/bwrap Disable /usr/bin/bwrap (requested /bin/bwrap) Disable /home/ubuntu/.nv Disable /home/ubuntu/.wget-hsts Disable /sys/fs Disable /sys/module Mounting noexec /run/firejail/mnt/pulse Mounting /run/firejail/mnt/pulse on /home/ubuntu/.config/pulse 1029 861 0:57 /pulse /home/ubuntu/.config/pulse rw,nosuid,nodev,noexec - tmpfs tmpfs rw,mode=755,inode64 mountid=1029 fsname=/pulse dir=/home/ubuntu/.config/pulse fstype=tmpfs Current directory: /home/ubuntu DISPLAY is not set Install protocol filter: unix,inet,inet6 configuring 14 seccomp entries in /run/firejail/mnt/seccomp/seccomp.protocol sbox run: /usr/lib/x86_64-linux-gnu/firejail/fsec-print /run/firejail/mnt/seccomp/seccomp.protocol (null) Dropping all capabilities Drop privileges: pid 3, uid 1000, gid 1000, nogroups 1 No supplementary groups line OP JT JF K ================================= 0000: 20 00 00 00000004 ld data.architecture 0001: 15 01 00 c000003e jeq ARCH_64 0003 (false 0002) 0002: 06 00 00 7fff0000 ret ALLOW 0003: 20 00 00 00000000 ld data.syscall-number 0004: 15 01 00 00000029 jeq socket 0006 (false 0005) 0005: 06 00 00 7fff0000 ret ALLOW 0006: 20 00 00 00000010 ld data.args[0] 0007: 15 00 01 00000001 jeq 1 0008 (false 0009) 0008: 06 00 00 7fff0000 ret ALLOW 0009: 15 00 01 00000002 jeq 2 000a (false 000b) 000a: 06 00 00 7fff0000 ret ALLOW 000b: 15 00 01 0000000a jeq a 000c (false 000d) 000c: 06 00 00 7fff0000 ret ALLOW 000d: 06 00 00 0005005f ret ERRNO(95) configuring 54 seccomp entries in /run/firejail/mnt/seccomp/seccomp.32 sbox run: /usr/lib/x86_64-linux-gnu/firejail/fsec-print /run/firejail/mnt/seccomp/seccomp.32 (null) Dropping all capabilities Drop privileges: pid 4, uid 1000, gid 1000, nogroups 1 No supplementary groups line OP JT JF K ================================= 0000: 20 00 00 00000004 ld data.architecture 0001: 15 01 00 40000003 jeq ARCH_32 0003 (false 0002) 0002: 06 00 00 7fff0000 ret ALLOW 0003: 20 00 00 00000000 ld data.syscall-number 0004: 15 30 00 00000015 jeq 15 0035 (false 0005) 0005: 15 2f 00 00000034 jeq 34 0035 (false 0006) 0006: 15 2e 00 0000001a jeq 1a 0035 (false 0007) 0007: 15 2d 00 0000011b jeq 11b 0035 (false 0008) 0008: 15 2c 00 00000155 jeq 155 0035 (false 0009) 0009: 15 2b 00 00000156 jeq 156 0035 (false 000a) 000a: 15 2a 00 0000007f jeq 7f 0035 (false 000b) 000b: 15 29 00 00000080 jeq 80 0035 (false 000c) 000c: 15 28 00 0000015e jeq 15e 0035 (false 000d) 000d: 15 27 00 00000081 jeq 81 0035 (false 000e) 000e: 15 26 00 0000006e jeq 6e 0035 (false 000f) 000f: 15 25 00 00000065 jeq 65 0035 (false 0010) 0010: 15 24 00 00000121 jeq 121 0035 (false 0011) 0011: 15 23 00 00000057 jeq 57 0035 (false 0012) 0012: 15 22 00 00000073 jeq 73 0035 (false 0013) 0013: 15 21 00 00000067 jeq 67 0035 (false 0014) 0014: 15 20 00 0000015b jeq 15b 0035 (false 0015) 0015: 15 1f 00 0000015c jeq 15c 0035 (false 0016) 0016: 15 1e 00 00000087 jeq 87 0035 (false 0017) 0017: 15 1d 00 00000095 jeq 95 0035 (false 0018) 0018: 15 1c 00 0000007c jeq 7c 0035 (false 0019) 0019: 15 1b 00 00000157 jeq 157 0035 (false 001a) 001a: 15 1a 00 000000fd jeq fd 0035 (false 001b) 001b: 15 19 00 00000150 jeq 150 0035 (false 001c) 001c: 15 18 00 00000152 jeq 152 0035 (false 001d) 001d: 15 17 00 0000015d jeq 15d 0035 (false 001e) 001e: 15 16 00 0000011e jeq 11e 0035 (false 001f) 001f: 15 15 00 0000011f jeq 11f 0035 (false 0020) 0020: 15 14 00 00000120 jeq 120 0035 (false 0021) 0021: 15 13 00 00000056 jeq 56 0035 (false 0022) 0022: 15 12 00 00000033 jeq 33 0035 (false 0023) 0023: 15 11 00 0000007b jeq 7b 0035 (false 0024) 0024: 15 10 00 000000d9 jeq d9 0035 (false 0025) 0025: 15 0f 00 000000f5 jeq f5 0035 (false 0026) 0026: 15 0e 00 000000f6 jeq f6 0035 (false 0027) 0027: 15 0d 00 000000f7 jeq f7 0035 (false 0028) 0028: 15 0c 00 000000f8 jeq f8 0035 (false 0029) 0029: 15 0b 00 000000f9 jeq f9 0035 (false 002a) 002a: 15 0a 00 00000101 jeq 101 0035 (false 002b) 002b: 15 09 00 00000112 jeq 112 0035 (false 002c) 002c: 15 08 00 00000114 jeq 114 0035 (false 002d) 002d: 15 07 00 00000126 jeq 126 0035 (false 002e) 002e: 15 06 00 0000013d jeq 13d 0035 (false 002f) 002f: 15 05 00 0000013c jeq 13c 0035 (false 0030) 0030: 15 04 00 0000003d jeq 3d 0035 (false 0031) 0031: 15 03 00 00000058 jeq 58 0035 (false 0032) 0032: 15 02 00 000000a9 jeq a9 0035 (false 0033) 0033: 15 01 00 00000082 jeq 82 0035 (false 0034) 0034: 06 00 00 7fff0000 ret ALLOW 0035: 06 00 00 00000000 ret KILL Dual 32/64 bit seccomp filter configured configuring 72 seccomp entries in /run/firejail/mnt/seccomp/seccomp sbox run: /usr/lib/x86_64-linux-gnu/firejail/fsec-print /run/firejail/mnt/seccomp/seccomp (null) Dropping all capabilities Drop privileges: pid 5, uid 1000, gid 1000, nogroups 1 No supplementary groups line OP JT JF K ================================= 0000: 20 00 00 00000004 ld data.architecture 0001: 15 01 00 c000003e jeq ARCH_64 0003 (false 0002) 0002: 06 00 00 7fff0000 ret ALLOW 0003: 20 00 00 00000000 ld data.syscall-number 0004: 35 01 00 40000000 jge X32_ABI 0006 (false 0005) 0005: 35 01 00 00000000 jge read 0007 (false 0006) 0006: 06 00 00 00050001 ret ERRNO(1) 0007: 15 3f 00 0000009f jeq adjtimex 0047 (false 0008) 0008: 15 3e 00 00000131 jeq clock_adjtime 0047 (false 0009) 0009: 15 3d 00 000000e3 jeq clock_settime 0047 (false 000a) 000a: 15 3c 00 000000a4 jeq settimeofday 0047 (false 000b) 000b: 15 3b 00 0000009a jeq modify_ldt 0047 (false 000c) 000c: 15 3a 00 000000d4 jeq lookup_dcookie 0047 (false 000d) 000d: 15 39 00 0000012a jeq perf_event_open 0047 (false 000e) 000e: 15 38 00 00000137 jeq process_vm_writev 0047 (false 000f) 000f: 15 37 00 000000b0 jeq delete_module 0047 (false 0010) 0010: 15 36 00 00000139 jeq finit_module 0047 (false 0011) 0011: 15 35 00 000000af jeq init_module 0047 (false 0012) 0012: 15 34 00 0000009c jeq _sysctl 0047 (false 0013) 0013: 15 33 00 000000b7 jeq afs_syscall 0047 (false 0014) 0014: 15 32 00 000000ae jeq create_module 0047 (false 0015) 0015: 15 31 00 000000b1 jeq get_kernel_syms 0047 (false 0016) 0016: 15 30 00 000000b5 jeq getpmsg 0047 (false 0017) 0017: 15 2f 00 000000b6 jeq putpmsg 0047 (false 0018) 0018: 15 2e 00 000000b2 jeq query_module 0047 (false 0019) 0019: 15 2d 00 000000b9 jeq security 0047 (false 001a) 001a: 15 2c 00 0000008b jeq sysfs 0047 (false 001b) 001b: 15 2b 00 000000b8 jeq tuxcall 0047 (false 001c) 001c: 15 2a 00 00000086 jeq uselib 0047 (false 001d) 001d: 15 29 00 00000088 jeq ustat 0047 (false 001e) 001e: 15 28 00 000000ec jeq vserver 0047 (false 001f) 001f: 15 27 00 000000ad jeq ioperm 0047 (false 0020) 0020: 15 26 00 000000ac jeq iopl 0047 (false 0021) 0021: 15 25 00 000000f6 jeq kexec_load 0047 (false 0022) 0022: 15 24 00 00000140 jeq kexec_file_load 0047 (false 0023) 0023: 15 23 00 000000a9 jeq reboot 0047 (false 0024) 0024: 15 22 00 000000a7 jeq swapon 0047 (false 0025) 0025: 15 21 00 000000a8 jeq swapoff 0047 (false 0026) 0026: 15 20 00 00000130 jeq open_by_handle_at 0047 (false 0027) 0027: 15 1f 00 0000012f jeq name_to_handle_at 0047 (false 0028) 0028: 15 1e 00 000000fb jeq ioprio_set 0047 (false 0029) 0029: 15 1d 00 00000067 jeq syslog 0047 (false 002a) 002a: 15 1c 00 0000012c jeq fanotify_init 0047 (false 002b) 002b: 15 1b 00 00000138 jeq kcmp 0047 (false 002c) 002c: 15 1a 00 000000f8 jeq add_key 0047 (false 002d) 002d: 15 19 00 000000f9 jeq request_key 0047 (false 002e) 002e: 15 18 00 000000ed jeq mbind 0047 (false 002f) 002f: 15 17 00 00000100 jeq migrate_pages 0047 (false 0030) 0030: 15 16 00 00000117 jeq move_pages 0047 (false 0031) 0031: 15 15 00 000000fa jeq keyctl 0047 (false 0032) 0032: 15 14 00 000000ce jeq io_setup 0047 (false 0033) 0033: 15 13 00 000000cf jeq io_destroy 0047 (false 0034) 0034: 15 12 00 000000d0 jeq io_getevents 0047 (false 0035) 0035: 15 11 00 000000d1 jeq io_submit 0047 (false 0036) 0036: 15 10 00 000000d2 jeq io_cancel 0047 (false 0037) 0037: 15 0f 00 000000d8 jeq remap_file_pages 0047 (false 0038) 0038: 15 0e 00 00000143 jeq userfaultfd 0047 (false 0039) 0039: 15 0d 00 000000a3 jeq acct 0047 (false 003a) 003a: 15 0c 00 00000141 jeq bpf 0047 (false 003b) 003b: 15 0b 00 000000a1 jeq chroot 0047 (false 003c) 003c: 15 0a 00 000000a5 jeq mount 0047 (false 003d) 003d: 15 09 00 000000b4 jeq nfsservctl 0047 (false 003e) 003e: 15 08 00 0000009b jeq pivot_root 0047 (false 003f) 003f: 15 07 00 000000ab jeq setdomainname 0047 (false 0040) 0040: 15 06 00 000000aa jeq sethostname 0047 (false 0041) 0041: 15 05 00 000000a6 jeq umount2 0047 (false 0042) 0042: 15 04 00 00000099 jeq vhangup 0047 (false 0043) 0043: 15 03 00 00000065 jeq ptrace 0047 (false 0044) 0044: 15 02 00 00000087 jeq personality 0047 (false 0045) 0045: 15 01 00 00000136 jeq process_vm_readv 0047 (false 0046) 0046: 06 00 00 7fff0000 ret ALLOW 0047: 06 00 01 00000000 ret KILL seccomp filter configured Mounting read-only /run/firejail/mnt/seccomp Dropping all capabilities noroot user namespace installed Dropping all capabilities NO_NEW_PRIVS set Drop privileges: pid 1, uid 1000, gid 1000, nogroups 0 Supplementary groups: 29 44 starting application LD_PRELOAD=(null) Running 'python' '-c' 'import torch; print(torch.cuda.is_available())' command through /bin/bash execvp argument 0: /bin/bash execvp argument 1: -c execvp argument 2: -- execvp argument 3: 'python' '-c' 'import torch; print(torch.cuda.is_available())' Child process initialized in 29.28 ms Installing /run/firejail/mnt/seccomp/seccomp seccomp filter Installing /run/firejail/mnt/seccomp/seccomp.32 seccomp filter Installing /run/firejail/mnt/seccomp/seccomp.protocol seccomp filter monitoring pid 6 /xxx/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 False Sandbox monitor: waitpid 6 retval 6 status 0```

kmk3 commented 1 year ago

@achsvg on Jan 27:

I have installed torch 1.13.1 with cuda support on a GPU machine. The following command fails:

firejail --noprofile -- python -c "import torch; print(torch.cuda.is_available())"
Parent pid 3594, child pid 3595
Child process initialized in 2.08 ms
/xxx/.venv/lib/python3.9/site-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

Parent is shutting down, bye...

Whereas running it outside of firejail gives:

python -c "import torch; print(torch.cuda.is_available())"
True

Interesting; does it work with firejail --profile=noprofile?

noprofile.profile removes even more restrictions than using --noprofile.

  • firejail version 0.9.62
  • Python 3.9.5
  • torch==1.13.1
  • CUDA 11.7
  • Ubuntu 20.04

Note that we do not maintain that firejail version:

See also:

What happens with the latest released firejail version?

achsvg commented 1 year ago

Some options in noprofile are not supported by the version I'm using but after removing them it works! So I managed to bisect which option affects the CUDA loading. And it looks like when allow-debuggers is in the profile it works. Not sure why CUDA would need this though...

I'm not able to test the latest released version yet but will try later.

rusty-snake commented 1 year ago

If I have to guess https://github.com/netblue30/firejail/blob/c2b6b6b1a348d70b776983051851e42ba66ab271/src/firejail/fs.c#L774-L780