shkhln / libc6-shim

Cheap glibc knockoff
MIT License
29 stars 5 forks source link

No devices were found on 13.2 #10

Closed fsmv closed 1 year ago

fsmv commented 1 year ago

I'm able to use https://gist.github.com/shkhln/40ef290463e78fb2b0000c60f4ad797e to load pytorch via linux-miniconda-installer and when I run /compat/linux/bin/nvidia-smi it finds CUDA but when I run nvidia-smi with nv-sglrun it finds no devices (also plain nvidia-smi works with no cuda).

I'm using version 20230629 and I'm running FreeBSD 13.2-RELEASE-p2

Here's the log:

% SHIM_DEBUG=1 nv-sglrun nvidia-smi
shim init
[17999:243994] shim_getpid()
[17999:243994] shim_getpid -> 17999
[17999:243994] shim_getenv("__NVML_DBG_LVL")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_getenv("__NVML_DBG_APPEND")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_getenv("__NVML_DBG_FILE")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_gettimeofday(0x82e080410, 0x0)
[17999:243994] shim_gettimeofday -> 0
[17999:243994] shim_memset(0x82d488ba0, 0, 12509464)
[17999:243994] shim_memset -> 0x82d488ba0
[17999:243994] shim_getenv("__NVML_CRAY_PSTATE")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_getenv("__NVIDIA_NVML_3373")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_getenv("__NVML_ONLY_DAEMON_PERSISTENCE_MODE")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_fopen("/proc/modules", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim___xstat(1, "/sys/bus/pci/devices", 0x820da2800)
[17999:243994] shim___xstat -> -1
[17999:243994] shim___errno_location()
[17999:243994] shim___errno_location -> 0x823c96e10
[17999:243994] shim_geteuid()
[17999:243994] shim_geteuid -> 1001
[17999:243994] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820da2d20)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_fopen("/proc/driver/nvidia/params", "r")
[17999:243994] shim_fopen -> 0x823ca24f0
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 2
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 1
[17999:243994] shim_fclose(0x823ca24f0)
[17999:243994] shim_fclose -> 0
[17999:243994] shim_snprintf(0x820da2a60, 128, "/dev/char/%d:%d", ...)
[17999:243994] shim_snprintf -> 17
[17999:243994] shim___xstat(1, "/dev/nvidiactl", 0x820da2bf0)
[17999:243994] shim___xstat -> 0
[17999:243994] shim_snprintf(0x820da2ae0, 128, "../%s", ...)
[17999:243994] shim_snprintf -> 12
[17999:243994] shim_remove("/dev/char/195:255")
[17999:243994] shim_remove -> -1
[17999:243994] shim_symlink("../nvidiactl", "/dev/char/195:255")
[17999:243994] shim_symlink -> -1
[17999:243994] shim___xstat(1, "/dev/char/195:255", 0x820da2b60)
[17999:243994] shim___xstat -> -1
[17999:243994] shim___errno_location()
[17999:243994] shim___errno_location -> 0x823c96e10
[17999:243994] shim_snprintf(0x820da2d90, 32, "-c=%d", ...)
[17999:243994] shim_snprintf -> 6
[17999:243994] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820da2ce0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_fopen("/proc/driver/nvidia/params", "r")
[17999:243994] shim_fopen -> 0x823ca24f0
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 2
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 1
[17999:243994] shim_fclose(0x823ca24f0)
[17999:243994] shim_fclose -> 0
[17999:243994] shim___xstat(1, "/dev/nvidiactl", 0x820da2c20)
[17999:243994] shim___xstat -> 0
[17999:243994] shim_open64("/dev/nvidiactl", 2, ...)
[17999:243994] shim_open64 -> 5
[17999:243994] shim_fcntl(5, 2, ...)
[17999:243994] shim_fcntl_impl: cmd = F_SETFD, arg = 0x1
[17999:243994] shim_fcntl -> 0
[17999:243994] shim_getenv("__RM_NO_VERSION_CHECK")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_ioctl(5, 0xc04846d2, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_open("/sys/devices/system/memory/block_size_bytes", 0, ...)
[17999:243994] shim_open -> -1
[17999:243994] shim___errno_location()
[17999:243994] shim___errno_location -> 0x823c96e10
[17999:243994] shim_ioctl(5, 0xc90046c8, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462b, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_fopen("/proc/devices", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim_snprintf(0x820da2690, 260, "-f=%s", ...)
[17999:243994] shim_snprintf -> 46
[17999:243994] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820da24b0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_fopen("/proc/devices", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim_fopen("/proc/driver/nvidia/capabilities/mig/config", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim___xstat(1, "", 0x820da23e0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_fopen("/proc/devices", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim_snprintf(0x820da2690, 260, "-f=%s", ...)
[17999:243994] shim_snprintf -> 47
[17999:243994] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820da24b0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_fopen("/proc/devices", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim_fopen("/proc/driver/nvidia/capabilities/mig/monitor", "r")
[17999:243994] shim_fopen -> 0x0
[17999:243994] shim___xstat(1, "", 0x820da23e0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_qsort(0x820da3150, 1, 12, 0x82d1885e0)
[17999:243994] shim_qsort -> void
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_snprintf(0x820da2300, 128, "/dev/nvidia%d", ...)
[17999:243994] shim_snprintf -> 12
[17999:243994] shim_fopen("/proc/driver/nvidia/params", "r")
[17999:243994] shim_fopen -> 0x823ca24f0
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 2
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 1
[17999:243994] shim_fclose(0x823ca24f0)
[17999:243994] shim_fclose -> 0
[17999:243994] shim_snprintf(0x820da2070, 128, "/dev/char/%d:%d", ...)
[17999:243994] shim_snprintf -> 15
[17999:243994] shim___xstat(1, "/dev/nvidia1", 0x820da2200)
[17999:243994] shim___xstat -> 0
[17999:243994] shim_snprintf(0x820da20f0, 128, "../%s", ...)
[17999:243994] shim_snprintf -> 10
[17999:243994] shim_remove("/dev/char/195:1")
[17999:243994] shim_remove -> -1
[17999:243994] shim_symlink("../nvidia1", "/dev/char/195:1")
[17999:243994] shim_symlink -> -1
[17999:243994] shim___xstat(1, "/dev/char/195:1", 0x820da2170)
[17999:243994] shim___xstat -> -1
[17999:243994] shim___errno_location()
[17999:243994] shim___errno_location -> 0x823c96e10
[17999:243994] shim_snprintf(0x820da23a0, 32, "-c=%d", ...)
[17999:243994] shim_snprintf -> 4
[17999:243994] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820da22f0)
[17999:243994] shim___xstat -> -1
[17999:243994] shim_snprintf(0x820da2300, 128, "/dev/nvidia%d", ...)
[17999:243994] shim_snprintf -> 12
[17999:243994] shim_fopen("/proc/driver/nvidia/params", "r")
[17999:243994] shim_fopen -> 0x823ca24f0
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 2
[17999:243994] shim___isoc99_fscanf(0x823ca24f0, "%31[^:]: %u
", ...)
[17999:243994] shim___isoc99_fscanf -> 1
[17999:243994] shim_fclose(0x823ca24f0)
[17999:243994] shim_fclose -> 0
[17999:243994] shim___xstat(1, "/dev/nvidia1", 0x820da2230)
[17999:243994] shim___xstat -> 0
[17999:243994] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[17999:243994] shim_getenv -> 0x0
[17999:243994] shim_memset(0x820da0d30, 0, 8576)
[17999:243994] shim_memset -> 0x820da0d30
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_qsort(0x820da0bb0, 0, 12, 0x82d1885e0)
[17999:243994] shim_qsort -> void
[17999:243994] shim_calloc(1544, 1)
[17999:243994] shim_calloc -> 0x829e14700
[17999:243994] shim_getpid()
[17999:243994] shim_getpid -> 17999
No devices were found
[17999:243994] shim_getpid()
[17999:243994] shim_getpid -> 17999
[17999:243994] shim_free(0x829e14700)
[17999:243994] shim_free -> void
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc020462a, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_time(0x0)
[17999:243994] shim_time -> 1693076141
[17999:243994] shim_ioctl(5, 0xc0104629, ...)
[17999:243994] shim_ioctl -> 0
[17999:243994] shim_close(5)
[17999:243994] shim_close -> 0
[17999:243994] shim_memset(0x82d488ba0, 0, 12509464)
[17999:243994] shim_memset -> 0x82d488ba0
[17999:243994] shim___cxa_finalize(0x82d488600)
[17999:243994] shim___cxa_finalize -> void
shkhln commented 1 year ago

What's the driver version? GPU model?

fsmv commented 1 year ago

I have nvidia-driver version 525.116.03

The GPU is RTX 3060

shkhln commented 1 year ago

Does /dev/nvidia1 actually exist?

fsmv commented 1 year ago

It does.

% ls -l /compat/linux/dev/nvidia1
crw-rw-rw-  1 root  wheel  0x71 Aug 24 23:10 /com
pat/linux/dev/nvidia1
% ls -l /dev/nvidia1
crw-rw-rw-  1 root  wheel  0x71 Aug 24 23:10 /dev
/nvidia1
shkhln commented 1 year ago

Would you mind checking d545115c27d19300f2f0ef6ba5c95f02af6648f4?

fsmv commented 1 year ago

Too bad I got busy and couldn't check the patch earlier. Is this supposed to be fixed in the 20230916 tag? I've just installed that version and I got the same output (previously I had 20230629 apparently).

fsmv commented 1 year ago

I checked the work file and I do have the code in the commit you mentioned and I'm still seeing no device found. @shkhln I should have more time to help debug now.

Here's the new log but I think it's the same:

% SHIM_DEBUG=1 nv-sglrun nvidia-smi
shim init
[10059:144091] shim_getpid()
[10059:144091] shim_getpid -> 10059
[10059:144091] shim_getenv("__NVML_DBG_LVL")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_getenv("__NVML_DBG_APPEND")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_getenv("__NVML_DBG_FILE")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_gettimeofday(0x82d0bd410, 0x0)
[10059:144091] shim_gettimeofday -> 0
[10059:144091] shim_memset(0x82c4c5ba0, 0, 12509464)
[10059:144091] shim_memset -> 0x82c4c5ba0
[10059:144091] shim_getenv("__NVML_CRAY_PSTATE")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_getenv("__NVIDIA_NVML_3373")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_getenv("__NVML_ONLY_DAEMON_PERSISTENCE_MODE")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_fopen("/proc/modules", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim___xstat(1, "/sys/bus/pci/devices", 0x820870950)
[10059:144091] shim___xstat -> -1
[10059:144091] shim___errno_location()
[10059:144091] shim___errno_location -> 0x822c70e10
[10059:144091] shim_geteuid()
[10059:144091] shim_geteuid -> 1001
[10059:144091] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820870e70)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_fopen("/proc/driver/nvidia/params", "r")
[10059:144091] shim_fopen -> 0x822c7c4f0
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 2
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 1
[10059:144091] shim_fclose(0x822c7c4f0)
[10059:144091] shim_fclose -> 0
[10059:144091] shim_snprintf(0x820870bb0, 128, "/dev/char/%d:%d", ...)
[10059:144091] shim_snprintf -> 17
[10059:144091] shim___xstat(1, "/dev/nvidiactl", 0x820870d40)
[10059:144091] shim___xstat -> 0
[10059:144091] shim_snprintf(0x820870c30, 128, "../%s", ...)
[10059:144091] shim_snprintf -> 12
[10059:144091] shim_remove("/dev/char/195:255")
[10059:144091] shim_remove -> -1
[10059:144091] shim_symlink("../nvidiactl", "/dev/char/195:255")
[10059:144091] shim_symlink -> -1
[10059:144091] shim___xstat(1, "/dev/char/195:255", 0x820870cb0)
[10059:144091] shim___xstat -> -1
[10059:144091] shim___errno_location()
[10059:144091] shim___errno_location -> 0x822c70e10
[10059:144091] shim_snprintf(0x820870ee0, 32, "-c=%d", ...)
[10059:144091] shim_snprintf -> 6
[10059:144091] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820870e30)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_fopen("/proc/driver/nvidia/params", "r")
[10059:144091] shim_fopen -> 0x822c7c4f0
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 2
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 1
[10059:144091] shim_fclose(0x822c7c4f0)
[10059:144091] shim_fclose -> 0
[10059:144091] shim___xstat(1, "/dev/nvidiactl", 0x820870d70)
[10059:144091] shim___xstat -> 0
[10059:144091] shim_open64("/dev/nvidiactl", 2, ...)
[10059:144091] shim_open64 -> 5
[10059:144091] shim_fcntl(5, 2, ...)
[10059:144091] shim_fcntl_impl: cmd = F_SETFD, arg = 0x1
[10059:144091] shim_fcntl -> 0
[10059:144091] shim_getenv("__RM_NO_VERSION_CHECK")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_ioctl(5, 0xc04846d2, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_open("/sys/devices/system/memory/block_size_bytes", 0, ...)
[10059:144091] shim_open -> -1
[10059:144091] shim___errno_location()
[10059:144091] shim___errno_location -> 0x822c70e10
[10059:144091] shim_ioctl(5, 0xc90046c8, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462b, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_fopen("/proc/devices", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim_snprintf(0x8208707e0, 260, "-f=%s", ...)
[10059:144091] shim_snprintf -> 46
[10059:144091] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820870600)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_fopen("/proc/devices", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim_fopen("/proc/driver/nvidia/capabilities/mig/config", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim___xstat(1, "", 0x820870530)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_fopen("/proc/devices", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim_snprintf(0x8208707e0, 260, "-f=%s", ...)
[10059:144091] shim_snprintf -> 47
[10059:144091] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820870600)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_fopen("/proc/devices", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim_fopen("/proc/driver/nvidia/capabilities/mig/monitor", "r")
[10059:144091] shim_fopen -> 0x0
[10059:144091] shim___xstat(1, "", 0x820870530)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_qsort(0x8208712a0, 1, 12, 0x82c1c55e0)
[10059:144091] shim_qsort -> void
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_snprintf(0x820870450, 128, "/dev/nvidia%d", ...)
[10059:144091] shim_snprintf -> 12
[10059:144091] shim_fopen("/proc/driver/nvidia/params", "r")
[10059:144091] shim_fopen -> 0x822c7c4f0
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 2
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 1
[10059:144091] shim_fclose(0x822c7c4f0)
[10059:144091] shim_fclose -> 0
[10059:144091] shim_snprintf(0x8208701c0, 128, "/dev/char/%d:%d", ...)
[10059:144091] shim_snprintf -> 15
[10059:144091] shim___xstat(1, "/dev/nvidia1", 0x820870350)
[10059:144091] shim___xstat -> 0
[10059:144091] shim_snprintf(0x820870240, 128, "../%s", ...)
[10059:144091] shim_snprintf -> 10
[10059:144091] shim_remove("/dev/char/195:1")
[10059:144091] shim_remove -> -1
[10059:144091] shim_symlink("../nvidia1", "/dev/char/195:1")
[10059:144091] shim_symlink -> -1
[10059:144091] shim___xstat(1, "/dev/char/195:1", 0x8208702c0)
[10059:144091] shim___xstat -> -1
[10059:144091] shim___errno_location()
[10059:144091] shim___errno_location -> 0x822c70e10
[10059:144091] shim_snprintf(0x8208704f0, 32, "-c=%d", ...)
[10059:144091] shim_snprintf -> 4
[10059:144091] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x820870440)
[10059:144091] shim___xstat -> -1
[10059:144091] shim_snprintf(0x820870450, 128, "/dev/nvidia%d", ...)
[10059:144091] shim_snprintf -> 12
[10059:144091] shim_fopen("/proc/driver/nvidia/params", "r")
[10059:144091] shim_fopen -> 0x822c7c4f0
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 2
[10059:144091] shim___isoc99_fscanf(0x822c7c4f0, "%31[^:]: %u
", ...)
[10059:144091] shim___isoc99_fscanf -> 1
[10059:144091] shim_fclose(0x822c7c4f0)
[10059:144091] shim_fclose -> 0
[10059:144091] shim___xstat(1, "/dev/nvidia1", 0x820870380)
[10059:144091] shim___xstat -> 0
[10059:144091] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[10059:144091] shim_getenv -> 0x0
[10059:144091] shim_memset(0x82086ee80, 0, 8576)
[10059:144091] shim_memset -> 0x82086ee80
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_qsort(0x82086ed00, 0, 12, 0x82c1c55e0)
[10059:144091] shim_qsort -> void
[10059:144091] shim_calloc(1544, 1)
[10059:144091] shim_calloc -> 0x827eb9700
[10059:144091] shim_getpid()
[10059:144091] shim_getpid -> 10059
No devices were found
[10059:144091] shim_getpid()
[10059:144091] shim_getpid -> 10059
[10059:144091] shim_free(0x827eb9700)
[10059:144091] shim_free -> void
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc020462a, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_time(0x0)
[10059:144091] shim_time -> 1695078816
[10059:144091] shim_ioctl(5, 0xc0104629, ...)
[10059:144091] shim_ioctl -> 0
[10059:144091] shim_close(5)
[10059:144091] shim_close -> 0
[10059:144091] shim_memset(0x82c4c5ba0, 0, 12509464)
[10059:144091] shim_memset -> 0x82c4c5ba0
[10059:144091] shim___cxa_finalize(0x82c4c5600)
[10059:144091] shim___cxa_finalize -> void
shkhln commented 1 year ago

Is this supposed to be fixed in the 20230916 tag? I've just installed that version

There is no installation procedure/script in the repo.

% SHIM_DEBUG=1 nv-sglrun nvidia-smi

Is the proper version actually in $PATH?

fsmv commented 1 year ago

I installed it by editing the makefile in /usr/ports/. I changed the version to that tag and updated the checksums and did make install.

Proper version of nv-sglrun or something else?

% which nv-sglrun
/usr/local/bin/nv-sglrun
% pkg which /usr/local/bin/nv-sglrun
/usr/local/bin/nv-sglrun was installed by package libc6-shim-20230916
shkhln commented 1 year ago

Perhaps we need different ids for different nvidia%d nodes: make_dev_id(195, 0) for nvidia0, make_dev_id(195, 1) for nvidia1 and so on. Want to try your hand at patching that?

fsmv commented 1 year ago

It worked! Awesome! You just need to add an atoi call to this to parse the device ID.

This is the patch I used

--- src/libc/sys/stat.c.orig    2023-09-19 04:38:42 UTC
+++ src/libc/sys/stat.c
@@ -114,7 +114,7 @@ static uint64_t make_dev_id(uint32_t major, uint32_t m
     switch (path[sizeof("/dev/nvidia") - 1]) {                    \
       case 'c': stat_buf->st_rdev = make_dev_id(195, 255); break; \
       case '-': stat_buf->st_rdev = make_dev_id(195, 254); break; \
-      default:  stat_buf->st_rdev = make_dev_id(195, 0);          \
+      default:  stat_buf->st_rdev = make_dev_id(195, 1);          \
     }                                                             \
   }

πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰πŸŽ‰

% nv-sglrun nvidia-smi 
shim init
Mon Sep 18 21:41:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.03   Driver Version: 525.116.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:65:00.0 Off |                  N/A |
| 53%   41C    P0    33W / 170W |      0MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
shkhln commented 1 year ago

Oh well, fixed by 3764b213b76826abd77bc275afd1faff3eb89236.