ilya-zlobintsev / LACT

Linux AMDGPU Configuration Tool
MIT License
988 stars 28 forks source link

Dual RX6600s - Only one GPU detected #313

Open GIJack opened 2 months ago

GIJack commented 2 months ago

Checklist

Bug description

I have a pair of GPUs, but only one is detected

from the amdcovc tool that sees both GPUs

$ amdcovc 
Adapter 0: PCI 11:0:0: Device 0000
  Core: 0 MHz, Mem: 96 MHz, CoreOD: 0, MemOD: 0, Vddc: 6 mV
  SOC: 640 MHz, DCEF: 480 MHz, FClock: 942 MHz
  PerfCtrl: auto, Load: 0%, MemLoad: 0%
  Temp: 32°C, T2: 32°C, T3: 32°C, Fan: 34.1176%
  Power: 3 W (cap: 120 W)
  Core Clocks: 0 0
  Memory Clocks: 96 541 675 875
  SOC Clocks: 417 640 1200
  DCEF Clocks: 417 480 1200
  F Clocks: 500 942 1801
Adapter 1: PCI 68:0:0: Device 0000
  Core: 800 MHz, Mem: 875 MHz, CoreOD: 0, MemOD: 0, Vddc: 718 mV
  SOC: 800 MHz, DCEF: 685 MHz, FClock: 1221 MHz
  PerfCtrl: auto, Load: 3%, MemLoad: 0%
  Temp: 39°C, T2: 39°C, T3: 44°C, Fan: 42.3529%
  Power: 18 W (cap: 100 W)
  Core Clocks: 500 800 2750
  Memory Clocks: 96 541 675 875
  SOC Clocks: 417 800 1200
  DCEF Clocks: 417 685 1200
  F Clocks: 500 1221 1801

your tool only sees one:

$ lact cli list-gpus
1002:73FF-1EAE:6505-0000:0b:00.0 (Navi 23 [Radeon RX 6600/6600 XT/6600M

System info

- LACT version:
$ pacman -Q lact
lact 0.5.4-2

- GPU model:
RX6600

- Kernel version:
$ uname -a
Linux iron 6.8.7-hardened1-2-hardened #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr 2024 22:21:16 +0000 x86_64 GNU/Lin

- Distribution:
Arch Linux
ilya-zlobintsev commented 2 months ago

Could you include a debug snapshot?

GIJack commented 2 months ago

LACT-sysfs-snapshot-20240428-184224.tar.gz

Debug snapshot

ilya-zlobintsev commented 2 months ago

The snapshot only contains a single GPU in /sys, which is weird. Could you show the output of

ls -la /sys/class/drm/

And also: does restarting the service (sudo systemctl restart lactd) change anything?

GIJack commented 1 month ago

The snapshot only contains a single GPU in /sys, which is weird. Could you show the output of

ls -la /sys/class/drm/

And also: does restarting the service (sudo systemctl restart lactd) change anything?

it does, weird. But it doesn't see both of them as enabled.

ilya-zlobintsev commented 1 month ago

By "it does" - do you mean that both GPUs are detected in LACT? And what do you mean "doesn't seem them as enabled"?

GIJack commented 1 month ago

yes, when lact is restarted when the system is running, both GPUs are found. When it runs on boot only one is.

ilya-zlobintsev commented 1 month ago

This seems to be another manifestation of the issue with LACT starting too early in the boot process, before all the sysfs entries are initialized. The current logic waits for 10 seconds since the startup of the system plus 1 GPU available, which I guess in your case isn't entirely correct.

I'll try to see if there's a way to make it more reliable for multi-gpu systems

ilya-zlobintsev commented 1 month ago

https://github.com/ilya-zlobintsev/LACT/commit/ea633220835b83df807e485aa887219b711fe388 should help with this. Please update to the latest commit, set log_level to debug in /etc/lact/config.yaml and tell me if this solves the problem. If it doesn't, then post the lact startup log from journalctl -u lactd -e