miskcoo / ugreen_leds_controller

An LED Controller for UGREEN's DX/DXP NAS Series
179 stars 26 forks source link

Disk mapping broken on DXP6800 Pro #9

Closed b-reich closed 5 months ago

b-reich commented 5 months ago

I tried the hctl and serial mapping methods

# led-disk mapping
mapping_method=hctl  # hctl, serial
led_map=(disk1 disk2 disk3 disk4 disk5 disk6 disk7 disk8)
# hctl, $> lsblk -S -o name,hctl
hctl_map=("0:0:0:0" "1:0:0:0" "2:0:0:0" "3:0:0:0" "4:0:0:0" "5:0:0:0" "6:0:0:0" "7:0:0:0")
# serial number, $> lsblk -S -o name,serial
serial_map=("ZR5DG9XX" "ZR5F8FXX" "ZR5EW2XX" "ZR5F8AX" "ZR5DH0XX" "ZR5F8GNXX" "placeholder6" "placeholder7")

hctl: changes by everytime I reboot or add/remove a disk serial: Disk6 is off all the time, and I think there is bug when one disk is missing. I test it more later.

running on Debian 12, Linux 6.1.0-21 using dkms

miskcoo commented 5 months ago

Can you check that whether disk6 appears in /sys/class/leds? If not, the issue is in the module.

Regarding the script, it is unfortunate to hear that the hctl mapping does not work on your model. For the disk6 issue, it may be because the script detects and creates the disk map at startup, so it can only handle disk removal. I may rewrite it to better handle the disk insertion event when I find some time.

If the issue lies in the problem mentioned above, a workaround could be adding a udev rule to restart the systemd service / the script when a disk removal or insertion event occurs.

b-reich commented 5 months ago

maybe mapping by/dev/disk/by-path/ is a thing?

b-reich commented 5 months ago

@miskcoo the kernel modules itself works.

grafik

Disk6 doenst work only with serial and hctl is totaly random everytime

miskcoo commented 5 months ago

maybe mapping by/dev/disk/by-path/ is a thing?

That seems work and is equivalent to HTCL in my model. How about yours? Would it be stable? I guess using the alphabetical order of these paths would be a possible solution.

# /dev/disks/by-path
lrwxrwxrwx 1 root root   9 Jun  9 16:41 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun  9 16:41 pci-0000:00:17.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun 10 18:59 pci-0000:00:17.0-ata-2 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 10 18:59 pci-0000:00:17.0-ata-2.0 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun  9 16:20 pci-0000:01:00.0-ata-1 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun  9 16:20 pci-0000:01:00.0-ata-1.0 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 15 15:23 pci-0000:01:00.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 15 15:23 pci-0000:01:00.0-ata-2.0 -> ../../sdb

# output of lsblk
NAME HCTL
sda  0:0:0:0
sdb  3:0:0:0
sdc  1:0:0:0
sdd  2:0:0:0
miskcoo commented 5 months ago

For the disk6 issue in the serial mapping mode, can you apply this patch and run the script ugreen-diskiomon manually to observe the output?

diff --git a/scripts/ugreen-diskiomon b/scripts/ugreen-diskiomon
index 1c00117..76152ec 100755
--- a/scripts/ugreen-diskiomon
+++ b/scripts/ugreen-diskiomon
@@ -43,6 +43,7 @@ done <<< "$(lsblk -S -o name,${mapping_method} | tail -n +2)"
 for i in "${!led_map[@]}"; do
     led=${led_map[i]} 
     if [[ -d /sys/class/leds/$led ]]; then
+        echo "set ${led} to the oneshot mode"
         echo oneshot > /sys/class/leds/$led/trigger
         echo 1 > /sys/class/leds/$led/invert
         echo 100 > /sys/class/leds/$led/delay_on
@@ -57,6 +58,7 @@ for i in "${!led_map[@]}"; do
             devices[$led]=${dev}
         else
             # turn off the led if no disk installed on this slot
+            echo "${dev} not found, turn off ${led}"
             echo 0 > /sys/class/leds/$led/brightness
             echo none > /sys/class/leds/$led/trigger
         fi
b-reich commented 5 months ago
root@r5d4:~# ls -lah /dev/disk/by-path/
total 0
drwxr-xr-x 2 root root 440 Jun 14 21:13 .
drwxr-xr-x 8 root root 160 Jun 14 21:02 ..
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-1 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-1.0 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-2 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-2.0 -> ../../sda
lrwxrwxrwx 1 root root  13 Jun 14 21:13 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root  13 Jun 14 21:13 pci-0000:56:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-1 -> ../../sdf
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-1.0 -> ../../sdf
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-2.0 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-3 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-3.0 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-4 -> ../../sde
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-4.0 -> ../../sde

root@r5d4:~# lsblk -S -o name,hctl
NAME HCTL
sda  1:0:0:0
sdb  3:0:0:0
sdc  0:0:0:0
sdd  4:0:0:0
sde  5:0:0:0
sdf  2:0:0:0
miskcoo commented 5 months ago

It seems they are also the same?

b-reich commented 5 months ago

not sure, if i understand the arch wiki correct https://wiki.archlinux.org/title/persistent_block_device_naming#by-id_and_by-path it should be the correct mapping between the leds and disks.

I have some spare time later, I can give it a try

meyergru commented 5 months ago

I do not see anything that has not been said before. The mapping of sdX to physical devices (as denoted by /dev/disk/by-path or HCTL) is indeed random - that is why we do not rely on sdX enumeration for locating the disks.

In your cited example, we have:

0:0:0:0 = pci-0000:00:17.0-ata-1 = sdc = disk1
1:0:0:0 = pci-0000:00:17.0-ata-2 = sda = disk2
2:0:0:0 = pci-0000:59:00.0-ata-1 = sdf = disk3
3:0:0:0 = pci-0000:59:00.0-ata-2 = sdb = disk4
4:0:0:0 = pci-0000:59:00.0-ata-3 = sdd = disk5
5:0:0:0 = pci-0000:59:00.0-ata-4 = sde = disk6

So what is wrong, actually with using HCTL for location?

What I care more about is what happens when you remove a disk in the middle, say, disk4. If in that case, the list above was missing 3:0:0:0, everything is fine. If only 5:0:0:0 was missing, we would be back to square one.

yooooki commented 5 months ago

Your serial of disk 6 seems 1 letter longer than other disks. Is that a typo in your post or in your script?

b-reich commented 5 months ago

Your serial of disk 6 seems 1 letter longer than other disks. Is that a typo in your post or in your script?

u are right... there was a 5 from placeholder5 left damit

b-reich commented 5 months ago

I do not see anything that has not been said before. The mapping of sdX to physical devices (as denoted by /dev/disk/by-path or HCTL) is indeed random - that is why we do not rely on sdX enumeration for locating the disks.

In your cited example, we have:

0:0:0:0 = pci-0000:00:17.0-ata-1 = sdc = disk1
1:0:0:0 = pci-0000:00:17.0-ata-2 = sda = disk2
2:0:0:0 = pci-0000:59:00.0-ata-1 = sdf = disk3
3:0:0:0 = pci-0000:59:00.0-ata-2 = sdb = disk4
4:0:0:0 = pci-0000:59:00.0-ata-3 = sdd = disk5
5:0:0:0 = pci-0000:59:00.0-ata-4 = sde = disk6

So what is wrong, actually with using HCTL for location?

What I care more about is what happens when you remove a disk in the middle, say, disk4. If in that case, the list above was missing 3:0:0:0, everything is fine. If only 5:0:0:0 was missing, we would be back to square one.

Disk6 is actual sda at the moment. sda == Disk6 sdb == Disk2 sdc == Disk5 sdd == Disk3 sde == Disk4 sdf == Disk1

serial seems to work now

meyergru commented 5 months ago

As I said: sdX is random with every reboot - what would be more interesting is if the slot numbers did not correlate with the HCTL order or are - as you claim - random, too (which I doubt).

Iff that is the case, we would have to reorder the HCTL values in hctl_map to fix it - BUT I have just verified for my DXP8800, the HCTL order does in fact correlate with the order of the disk slots and the by-path correlates with HCTL:

# /dev/disk/by-path
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-2 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-3 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-4 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-1 -> ../../sde
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-2 -> ../../sdf
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-3 -> ../../sdg
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-4 -> ../../sdh

# lsblk -S -x hctl -o name,hctl,serial
NAME HCTL       SERIAL
sda  0:0:0:0    XXKERENY
sdc  1:0:0:0    XXKG2BXY
sdb  2:0:0:0    XXGMU62K
sdd  3:0:0:0    XXKJEZBY
sde  4:0:0:0    XXKJHBYY
sdf  5:0:0:0    XXGT2Z9L
sdg  6:0:0:0    XXKH3SGZ
sdh  7:0:0:0    XXJDB19L

As you can see, sdb and sdc are out of order (i.e. they can be random), but HCTL is not - plus, I have verified the slot order by looking at each serial. I think @miskcoo did the same for his DXP4800.

In order to check if the DXP6800 is special, please show that HCTL order does not fit your slot order by showing us the output of "lsblk -S -x hctl -o name,hctl,serial" (we assume that the verified slot order of your serial numbers in the initial post is 1-6).

meyergru commented 5 months ago

@miskcoo: You should probably change the comment:

# serial number, $> lsblk -S -o name,serial

to:

# serial number, $> lsblk -S -x hctl -o hctl,serial,name

Otherwise, people will just execute that and list the serial numbers in the random sdX order (which will be wrong in almost all cases).

b-reich commented 5 months ago

I am fine with the serial implementation at the moment. I am currently trying to find out how UGOS solved it. Running it in a VM.

Also taking an eye on /dev/disk/by-path. Just because it's out of order doesn't mean it's random. But admittedly it looks very much like it. And unfortunately probably true.

Thanks for your work and support! :)

b-reich commented 5 months ago

I can check hctl after a fresh reboot. Running badblocks at the moment (24h left)

b-reich commented 5 months ago

hctl seems to work, maybe the readme should include a step to configure the mapping?

#1st reboot
root@r5d4:~# lsblk -S -x hctl -o hctl,serial,name
HCTL       SERIAL   NAME
0:0:0:0    ZR5DH0xx sda
1:0:0:0    ZR5F8Gxx sdb
2:0:0:0    ZR5DG9xx sdf
3:0:0:0    ZR5F8Fxx sdc
4:0:0:0    ZR5EW2xx sdd
5:0:0:0    ZR5F8Axx sde

#2nd reboot 
HCTL       SERIAL   NAME
0:0:0:0    ZR5DH0xx sdc
1:0:0:0    ZR5F8Gxx sdb
2:0:0:0    ZR5DG9xx sda
3:0:0:0    ZR5F8Fxx sde
4:0:0:0    ZR5EW2xx sdd
5:0:0:0    ZR5F8Axx sdf

Correct order is

ZR5DG9xx 
ZR5F8Fxx 
ZR5EW2xx
ZR5F8Axx 
ZR5DH0xx
ZR5F8Gxx

Think we can close this issue.

meyergru commented 5 months ago

O.K., so HCTL is predictable, but in the case of your DXP6800, the mapping between HCTL and slots is:

0:0:0:0 -> disk5 1:0:0:0 -> disk6 2:0:0:0 -> disk1 3:0:0:0 -> disk2 4:0:0:0 -> disk3 5:0:0:0 -> disk4

Which is unexpected and makes HCTL mapping for that device useless as-is.

Whereas with the DXP4800 it is (expected):

0:0:0:0 -> disk1 1:0:0:0 -> disk2 2:0:0:0 -> disk3 3:0:0:0 -> disk4

and with the DXP8800 it is:

0:0:0:0 -> disk1 1:0:0:0 -> disk2 2:0:0:0 -> disk3 3:0:0:0 -> disk4 4:0:0:0 -> disk5 5:0:0:0 -> disk6 6:0:0:0 -> disk7 7:0:0:0 -> disk8

So, the issue can be closed, but:

  1. Documentation should be changed
  2. HCTL order must be made model-dependent
b-reich commented 5 months ago

Maybe, my dropbear-initramfs mess this up (my root volume is encrypted). But thats just a random idea.

meyergru commented 5 months ago

Nope, most probably not. This has to do with how the SATA controllers are connected to the PCIe bus and then to the disk slots. There are up to 4 SATA channels on each controller chip, but yours is not in this configuration:

first controller -> disk1-4 second controller -> disk5-6

but:

first controller -> disk5-6 second controller -> disk1-4

which results in the HCTL order we see.

For the DXP8800, the configuration is:

first controller -> disk1-4 second controller -> disk5-8

miskcoo commented 5 months ago

Interesting. I think I should mention this in the document or script.