Closed verm closed 1 year ago
Log? I didn't just amend the assertion check, I also verified that the whole thing works with 525.
Oh shoot I forgot to add it here it is:
# SHIM_DEBUG=1 nv-sglrun nvidia-smi
shim init
[2525:112432] shim_getpid()
[2525:112432] shim_getpid -> 2525
[2525:112432] shim_getenv("__NVML_DBG_LVL")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_getenv("__NVML_DBG_APPEND")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_getenv("__NVML_DBG_FILE")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_gettimeofday(0x8025b4410, 0x0)
[2525:112432] shim_gettimeofday -> 0
[2525:112432] shim_memset(0x8019bcba0, 0, 12509464)
[2525:112432] shim_memset -> 0x8019bcba0
[2525:112432] shim_getenv("__NVML_CRAY_PSTATE")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_getenv("__NVIDIA_NVML_3373")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_getenv("__NVML_ONLY_DAEMON_PERSISTENCE_MODE")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_fopen("/proc/modules", "r")
[2525:112432] shim_fopen -> 0x0
[2525:112432] shim___xstat(1, "/sys/bus/pci/devices", 0x7fffffffb400)
[2525:112432] shim___xstat -> -1
[2525:112432] shim___errno_location()
[2525:112432] shim___errno_location -> 0x80090e890
[2525:112432] shim_geteuid()
[2525:112432] shim_geteuid -> 0
[2525:112432] shim_fopen("/proc/sys/kernel/modprobe", "r")
[2525:112432] shim_fopen -> 0x0
[2525:112432] shim___xstat(1, "/sbin/modprobe", 0x7fffffffb500)
[2525:112432] shim___xstat -> -1
[2525:112432] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x7fffffffb920)
[2525:112432] shim___xstat -> -1
[2525:112432] shim_fopen("/proc/driver/nvidia/params", "r")
[2525:112432] shim_fopen -> 0x800919f70
[2525:112432] shim___isoc99_fscanf(0x800919f70, "%31[^:]: %u
", ...)
[2525:112432] shim___isoc99_fscanf -> 2
[2525:112432] shim___isoc99_fscanf(0x800919f70, "%31[^:]: %u
", ...)
[2525:112432] shim___isoc99_fscanf -> 1
[2525:112432] shim_fclose(0x800919f70)
[2525:112432] shim_fclose -> 0
[2525:112432] shim_snprintf(0x7fffffffb660, 128, "/dev/char/%d:%d", ...)
[2525:112432] shim_snprintf -> 17
[2525:112432] shim___xstat(1, "/dev/nvidiactl", 0x7fffffffb7f0)
[2525:112432] shim___xstat -> 0
[2525:112432] shim_snprintf(0x7fffffffb6e0, 128, "../%s", ...)
[2525:112432] shim_snprintf -> 12
[2525:112432] shim_remove("/dev/char/195:255")
[2525:112432] shim_remove -> -1
[2525:112432] shim_symlink("../nvidiactl", "/dev/char/195:255")
[2525:112432] shim_symlink -> -1
[2525:112432] shim___xstat(1, "/dev/char/195:255", 0x7fffffffb760)
[2525:112432] shim___xstat -> -1
[2525:112432] shim___errno_location()
[2525:112432] shim___errno_location -> 0x80090e890
[2525:112432] shim_snprintf(0x7fffffffb990, 32, "-c=%d", ...)
[2525:112432] shim_snprintf -> 6
[2525:112432] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim___xstat(1, "/usr/bin/nvidia-modprobe", 0x7fffffffb8e0)
[2525:112432] shim___xstat -> -1
[2525:112432] shim_fopen("/proc/driver/nvidia/params", "r")
[2525:112432] shim_fopen -> 0x800919f70
[2525:112432] shim___isoc99_fscanf(0x800919f70, "%31[^:]: %u
", ...)
[2525:112432] shim___isoc99_fscanf -> 2
[2525:112432] shim___isoc99_fscanf(0x800919f70, "%31[^:]: %u
", ...)
[2525:112432] shim___isoc99_fscanf -> 1
[2525:112432] shim_fclose(0x800919f70)
[2525:112432] shim_fclose -> 0
[2525:112432] shim___xstat(1, "/dev/nvidiactl", 0x7fffffffb820)
[2525:112432] shim___xstat -> 0
[2525:112432] shim_getenv("__RM_ENABLE_VERBOSE_OUTPUT")
[2525:112432] shim_getenv -> 0x0
[2525:112432] shim_fopen("/dev/nvidiactl", "r")
[2525:112432] shim_fopen -> 0x800919f70
[2525:112432] shim_fclose(0x800919f70)
[2525:112432] shim_fclose -> 0
Failed to initialize NVML: GPU access blocked by the operating system
[2525:112432] shim___cxa_finalize(0x8019bc600)
[2525:112432] shim___cxa_finalize -> void
I did an update from 13.1 to 13.2 using freebsd-update. I also reinstalled all my ports I did use ktrace and see if it was picking up any dangling libraries but I couldn't find any. I also don't have anything special set in sysctl.conf.
Okay sorry for the noise I have no idea what happened decided to wipe all the nvidia libraries manually remove the ports, removed libc6 and reinstalled it not works. So something must have been either dangling or I messed up the original install from source, strange.
The assertion was fixed in in c954193 but now I get the error in the title.
I've been reading up on what causes this error but it doesn't seem straight forward I've of course tried this as root.
I'll try downgrading the driver version later today I'm running 525.105.17 on 13.2 with a RTX 3060.
Thanks.