intel / ledmon

Enclosure LED Utilities
GNU General Public License v2.0
73 stars 47 forks source link

[BUG]: locate E3.S NVMe SSD show many abnormal output (SCSI: Unable to locate...) #246

Open jackeichen opened 2 weeks ago

jackeichen commented 2 weeks ago

Description

When try to locate the nvme disk, get many abnormal output.

Steps to reproduce bug

  1. start ledmon.serivce
  2. locate the nvme disk: # ledctl locate=/dev/nvme2n1

Expected behavior

get many abnormal output:

[root@localhost ~]# ledctl locate=/dev/nvme2n1
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13

Actual behavior

There shoule be no these output in the past.

Environment

[root@localhost ~]# mdadm --detail-platform
       Platform : Intel(R) Virtual RAID on CPU
        Version : 9.0.0.1088
    RAID Levels : raid0 raid1 raid5 raid10
    Chunk Sizes : 4k 8k 16k 32k 64k 128k
    2TB volumes : supported
      2TB disks : supported
      Max Disks : 96
    Max Volumes : 2 per array, 24 per controller
 3rd party NVMe : supported
 I/O Controller : /sys/devices/pci0000:f2/0000:f2:00.5 (VMD)
 NVMe under VMD : /dev/nvme2n1 (XXXXXXXXXXX)
                  Encryption(Ability|Status): SED|Unencrypted
 NVMe under VMD : /dev/nvme4n1 (XXXXXXXXXXX)
                  Encryption(Ability|Status): SED|Unencrypted
 I/O Controller : /sys/devices/pci0000:0a/0000:0a:00.5 (VMD)
 I/O Controller : /sys/devices/pci0000:75/0000:75:00.5 (VMD)
 NVMe under VMD : /dev/nvme3n1 (XXXXXXXXXXX)
                  Encryption(Ability|Status): SED|Unencrypted
 NVMe under VMD : /dev/nvme5n1 (XXXXXXXXXXX)
                  Encryption(Ability|Status): SED|Unencrypted

[root@localhost ~]# ls -l /sys/block
total 0
lrwxrwxrwx. 1 root root 0 Aug 26 22:31 dm-0 -> ../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0 Aug 27 02:31 dm-1 -> ../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 dm-2 -> ../devices/virtual/block/dm-2
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 nvme0n1 -> ../devices/pci0000:af/0000:af:08.0/0000:b0:00.0/nvme/nvme0/nvme0n1
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 nvme1n1 -> ../devices/pci0000:af/0000:af:09.0/0000:b1:00.0/nvme/nvme1/nvme1n1
lrwxrwxrwx. 1 root root 0 Aug 27 03:08 nvme2n1 -> ../devices/pci0000:f2/0000:f2:00.5/pci10002:00/10002:00:02.0/10002:0a:00.0/nvme/nvme2/nvme2n1
lrwxrwxrwx. 1 root root 0 Aug 26 22:31 nvme3n1 -> ../devices/pci0000:75/0000:75:00.5/pci10001:80/10001:80:02.0/10001:8a:00.0/nvme/nvme3/nvme3n1
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 nvme4n1 -> ../devices/pci0000:f2/0000:f2:00.5/pci10002:00/10002:00:06.0/10002:0b:00.0/nvme/nvme4/nvme4n1
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 nvme5n1 -> ../devices/pci0000:75/0000:75:00.5/pci10001:80/10001:80:06.0/10001:8b:00.0/nvme/nvme5/nvme5n1
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sda -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:2/0:2:2:0/block/sda
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdb -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:0/0:2:0:0/block/sdb
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdc -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:1/0:2:1:0/block/sdc
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdd -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:3/0:2:3:0/block/sdd
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sde -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:6/0:2:6:0/block/sde
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdf -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:5/0:2:5:0/block/sdf
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdg -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:4/0:2:4:0/block/sdg
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdh -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:8/0:2:8:0/block/sdh
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdi -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:7/0:2:7:0/block/sdi
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdj -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:9/0:2:9:0/block/sdj
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdk -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:10/0:2:10:0/block/sdk
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdl -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:11/0:2:11:0/block/sdl
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdm -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:12/0:2:12:0/block/sdm
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdn -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:14/0:2:14:0/block/sdn
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdo -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:15/0:2:15:0/block/sdo
lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdp -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:13/0:2:13:0/block/sdp

[root@localhost ~]# cat /sys/module/nvme_core/parameters/multipath
Y

Ledmon version

[root@localhost ~]# ledmon --version
Intel(R) Enclosure LED Monitor Service 1.0.0
Copyright (C) 2009-2024 Intel Corporation.

ledmon[13088]: exit status is STATUS_SUCCESS.

Ledmon logs

No response

Ledctl logs

No response

Ledmon supported controllers

[root@localhost ~]# ledctl --list-controllers
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
ledctl: SCSI: Unable to locate slot in enclosure 0
ledctl: SCSI: Unable to locate slot in enclosure 1
ledctl: SCSI: Unable to locate slot in enclosure 2
ledctl: SCSI: Unable to locate slot in enclosure 3
ledctl: SCSI: Unable to locate slot in enclosure 4
ledctl: SCSI: Unable to locate slot in enclosure 5
ledctl: SCSI: Unable to locate slot in enclosure 6
ledctl: SCSI: Unable to locate slot in enclosure 7
ledctl: SCSI: Unable to locate slot in enclosure 8
ledctl: SCSI: Unable to locate slot in enclosure 9
ledctl: SCSI: Unable to locate slot in enclosure 10
ledctl: SCSI: Unable to locate slot in enclosure 11
ledctl: SCSI: Unable to locate slot in enclosure 12
ledctl: SCSI: Unable to locate slot in enclosure 13
/sys/devices/pci0000:f2/0000:f2:00.5/pci10002:00/10002:00:02.0 (NPEM)
/sys/devices/pci0000:0a/0000:0a:02.0/0000:0b:00.0/0000:0c:00.0 (NPEM)
/sys/devices/pci0000:8a/0000:8a:02.0/0000:8b:00.0/0000:8c:04.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:03.0 (NPEM)
/sys/devices/pci0000:0a/0000:0a:02.0/0000:0b:00.0/0000:0c:03.0 (NPEM)
/sys/devices/pci0000:8a/0000:8a:02.0/0000:8b:00.0/0000:8c:00.0 (NPEM)
/sys/devices/pci0000:f2/0000:f2:00.5 (VMD)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:02.0 (NPEM)
/sys/devices/pci0000:8a/0000:8a:02.0/0000:8b:00.0/0000:8c:03.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:02.0 (NPEM)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:1f.0/0000:3f:00.0 (SCSI)
/sys/devices/pci0000:0a/0000:0a:02.0/0000:0b:00.0/0000:0c:02.0 (NPEM)
/sys/devices/pci0000:75/0000:75:00.5/pci10001:80/10001:80:06.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:1f.0/0000:be:00.0 (SCSI)
/sys/devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0 (SCSI)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:01.0 (NPEM)
/sys/devices/pci0000:0a/0000:0a:00.5 (VMD)
/sys/devices/pci0000:8a/0000:8a:02.0/0000:8b:00.0/0000:8c:02.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:01.0 (NPEM)
/sys/devices/pci0000:75/0000:75:00.5/pci10001:80/10001:80:02.0 (NPEM)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:04.0 (NPEM)
/sys/devices/pci0000:0a/0000:0a:02.0/0000:0b:00.0/0000:0c:01.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:04.0 (NPEM)
/sys/devices/pci0000:f2/0000:f2:00.5/pci10002:00/10002:00:06.0 (NPEM)
/sys/devices/pci0000:0a/0000:0a:02.0/0000:0b:00.0/0000:0c:04.0 (NPEM)
/sys/devices/pci0000:75/0000:75:00.5 (VMD)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:00.0 (NPEM)
/sys/devices/pci0000:8a/0000:8a:02.0/0000:8b:00.0/0000:8c:01.0 (NPEM)
/sys/devices/pci0000:b6/0000:b6:02.0/0000:b7:00.0/0000:b8:00.0 (NPEM)
/sys/devices/pci0000:37/0000:37:02.0/0000:38:00.0/0000:39:03.0 (NPEM)

Additional information

No response

jackeichen commented 2 weeks ago

Addtional information:

  1. Although E3.S NVMe SSD show many abnormal output, the led blinks (works as expected);
  2. The test could pass in U.2 NVMe SSD, but failed in E3.S NVMe SSD;
bkucman commented 2 weeks ago

Hi @jackeichen, thanks for reporting the issue, this noisy error message comes from the initialization stage of block devices in ledctl, so it has no effect on setting the LED status on the given nvme disk, as it is a different controller.

Could you check if setting the LED status on SATA drives works? It seems that it is from this controller these error messages come. lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sda -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:2/0:2:2:0/block/sda lrwxrwxrwx. 1 root root 0 Aug 27 02:32 sdb -> ../devices/pci0000:6c/0000:6c:02.0/0000:6d:00.0/host0/target0:2:0/0:2:0:0/block/sdb

Please also provide logs from the command with increased logging level and SCSI/SATA controller information:

Thanks, Blazej

mtkaczyk commented 2 weeks ago

@jackeichen please also let us know, in with ledmon version had it been introduced. I see that migration to lib didn't changed it: https://github.com/intel/ledmon/commit/38ac67fc5b85d4a85a49a9274e387ffd095e7516#diff-9f162c028bce3fdebfebdc7089698c976cc43c8a49e6b04b6feb8fc2137ce4eb

ktanska commented 1 week ago

@jackeichen were you able to retest it? Can you attach mentioned logs, please?

jackeichen commented 1 day ago

@mtkaczyk @ktanska hi guys, my server is running test, until the next week. Could you wait for some days?