Closed tasleson closed 11 months ago
This is caused by NVMe multipath support. We'll need to add some logic to handle this. /dev/nvme0n1
is a virtual
path which can have 1 or more physical controllers. The other block devices for each of the redundant paths are "hidden".
More details here
This is caused by NVMe multipath support. We'll need to add some logic to handle this.
/dev/nvme0n1
is avirtual
path which can have 1 or more physical controllers. The other block devices for each of the redundant paths are "hidden".
In this case I would check if we can report controller earlier, to avoid multipath specyfic paths but I don't know if it is possible and I don't have hardware to check:
/sys/devices/pci0000:e2/0000:e2:02.0/0000:e3:00.0/0000:e4:04.0/0000:ef:00.0 (Dell SSD)
/sys/devices/pci0000:64/0000:64:02.0/0000:65:00.0/0000:66:00.0/0000:67:00.0 (Dell SSD)
/sys/devices/pci0000:e2/0000:e2:02.0/0000:e3:00.0/0000:e4:00.0/0000:e5:00.0 (Dell SSD)
/sys/devices/pci0000:64/0000:64:02.0/0000:65:00.0/0000:66:04.0/0000:71:00.0 (Dell SSD)
Theoretically, if the controller will be reported for real NVME device, then problem should disappear (If I assumed correctly that these paths are NVMEs). We will need to do this anyway to hide duplicates.
I would prefer to not hide controllers but I understand that in this case it could not be possible.
This is caused by NVMe multipath support. We'll need to add some logic to handle this.
/dev/nvme0n1
is avirtual
path which can have 1 or more physical controllers. The other block devices for each of the redundant paths are "hidden".In this case I would check if we can report controller earlier, to avoid multipath specyfic paths but I don't know if it is possible and I don't have hardware to check:
The issue is that the normal /dev/nvme0n1
is the "virtual" one. There is nothing in /dev/
that refers to the physical one. It can be seen in sysfs /sys/block/nvme0c0n1
but to figure that out in code is a little weird IMHO.
/sys/devices/pci0000:e2/0000:e2:02.0/0000:e3:00.0/0000:e4:04.0/0000:ef:00.0 (Dell SSD) /sys/devices/pci0000:64/0000:64:02.0/0000:65:00.0/0000:66:00.0/0000:67:00.0 (Dell SSD) /sys/devices/pci0000:e2/0000:e2:02.0/0000:e3:00.0/0000:e4:00.0/0000:e5:00.0 (Dell SSD) /sys/devices/pci0000:64/0000:64:02.0/0000:65:00.0/0000:66:04.0/0000:71:00.0 (Dell SSD)
Theoretically, if the controller will be reported for real NVME device, then problem should disappear (If I assumed correctly that these paths are NVMEs). We will need to do this anyway to hide duplicates.
All these paths are real NVMe. There are no duplicates as this system only has 1 path to each of the devices. I'm not sure if you can have multiple paths for directly attached PCI based NVMe devices. I would think this only applies if you're using NVMe-of. In that case I don't believe we can control the LED's anyways.
I would prefer to not hide controllers but I understand that in this case it could not be possible.
I don't think we need to hide anything yet.
This is caused by NVMe multipath support. We'll need to add some logic to handle this.
/dev/nvme0n1
is avirtual
path which can have 1 or more physical controllers. The other block devices for each of the redundant paths are "hidden".In this case I would check if we can report controller earlier, to avoid multipath specyfic paths but I don't know if it is possible and I don't have hardware to check:
The issue is that the normal
/dev/nvme0n1
is the "virtual" one. There is nothing in/dev/
that refers to the physical one. It can be seen in sysfs/sys/block/nvme0c0n1
but to figure that out in code is a little weird IMHO.
You right, my bad, I was surprised by a huge list of controllers so first I wanted to make it smaller but it is not possible. Those links are pci devices (physical ones) and there are no nvme-subsystem entries. At first glance, I thought that there are duplicates on controller list (multiple links to the same nvme device ).
Let me review the change you proposed, I need to setup platform with MP drives to see how it will affect ledmon so I need few days.
Should be addressed with: https://github.com/intel/ledmon/pull/167
I built latest code which also exhibits the same error. Adding some debug, the issue appears because on this particular dell when we translate from
/dev/nvme0n1
to/sys/dev/block/259:7
to/sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
and compare that with our list of controllers.We fail to match:
/sys/devices/pci0000:64/0000:64:02.0/0000:65:00.0/0000:66:00.0/0000:67:00.0/0000:68:04.0/0000:6a:00.0/nvme/nvme0/nvme0c0n1
to/sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
Not sure what the correction for this is at the moment.