Closed mtkaczyk closed 12 months ago
This change will not pass the updated unit test as:
slot: 4-1 led state: LOCATE device: /dev/nvme6c6n1
device refers to a device that doesn't exist.
@tasleson
Yes:
[root@gklab-108-180 tests]# ./lib_unit_test
Running suite(s): ledmon
60%: Checks: 5, Failures: 2, Errors: 0
lib_unit_test.c:108:F:lib_unit_test:test_list_slots:0: stat failed for No such file or directory, errno: (null)
lib_unit_test.c:266:F:lib_unit_test:test_led_by_path:0: led_device_name_lookup 8
[root@gklab-108-180 tests]#
I wanted to left it but after rethinking I see that I can fix it easily, so will do.
@tasleson done:
[root@gklab-108-180 ledmon]# ./src/ledctl/ledctl --list-slots -c vmd
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
slot: 4-2 led state: LOCATE device: (empty)
slot: 2-1 led state: NORMAL device: /dev/nvme4n1
slot: 3 led state: LOCATE device: (empty)
slot: 3-2 led state: NORMAL device: /dev/nvme9n1
slot: 1 led state: LOCATE device: (empty)
slot: 1-1 led state: NORMAL device: /dev/nvme3n1
slot: 4-1 led state: LOCATE device: (empty)
slot: 2-2 led state: LOCATE device: /dev/nvme8n1
slot: 4 led state: LOCATE device: /dev/nvme2n1
slot: 2 led state: LOCATE device: /dev/nvme0n1
slot: 3-1 led state: LOCATE device: /dev/nvme5n1
slot: 1-2 led state: LOCATE device: (empty)
[root@gklab-108-180 ledmon]# ./test
test-driver tests/
[root@gklab-108-180 ledmon]# ./tests/
.deps/ .libs/ lib_unit_test __pycache__/ runtests.sh
[root@gklab-108-180 ledmon]# ./tests/lib_unit_test
Running suite(s): ledmon
100%: Checks: 5, Failures: 0, Errors: 0
[root@gklab-108-180 ledmon]#
@mtkaczyk
This passes the unit test, but I see the code removes the device block if it doesn't represent an actual device node. It would be better if we could convert back to the virtual device node that the user interacts with, eg: slot: 4-1
-> /dev/nvme6n1
@mtkaczyk I think we could do something like:
diff --git a/src/lib/libled.c b/src/lib/libled.c
index 0c5083f..d0e5891 100644
--- a/src/lib/libled.c
+++ b/src/lib/libled.c
@@ -159,11 +159,32 @@ void led_flush(struct led_ctx *ctx)
}
}
+static void set_slot_device_name(const char *sysfs_path, struct led_slot_list_entry *se)
+{
+ char temp[PATH_MAX];
+ struct stat st;
+
+ snprintf(temp, PATH_MAX, "/dev/%s", basename(sysfs_path));
+ if (stat(temp, &st) < 0) {
+ int nvme_num, ns;
+
+ /* device node not present, check for physical path for virt nvme
+ * eg. nvme12c12n1 -> nvme12n1
+ */
+ if (sscanf(temp, "/dev/nvme%dc%*dn%d", &nvme_num, &ns) != 2)
+ return;
+
+ snprintf(temp, PATH_MAX, "/dev/nvme%dn%d", nvme_num, ns);
+ if (stat(temp, &st) < 0)
+ return;
+ }
+
+ str_cpy(se->device_name, temp, PATH_MAX);
+}
+
static struct led_slot_list_entry *init_slot(struct slot_property *slot)
{
struct led_slot_list_entry *s = NULL;
- struct stat st;
- char temp[PATH_MAX];
if (!slot)
return NULL;
@@ -180,10 +201,7 @@ static struct led_slot_list_entry *init_slot(struct slot_property *slot)
* Assumption that "/sys/block/" device has a devnode is no longer true with nvme multipath.
*/
if (slot->bl_device) {
- snprintf(temp, PATH_MAX, "/dev/%s", basename(slot->bl_device->sysfs_path));
- if (stat(temp, &st) < 0)
- return s;
- str_cpy(s->device_name, temp, PATH_MAX);
+ set_slot_device_name(slot->bl_device->sysfs_path, s);
}
return s;
See: https://github.com/intel/ledmon/pull/171
This is untested as I don't have a way to test it!
HI @tasleson, Sorry for the delay. I need to focus on other high priority task for a while. I will back to this as soon as possible.
Hi @tasleson,
Please take a look and let me know what do you think about implementing it this way.
The change is not complete yet, I need to add some tests, and lock possibility to use locate=/whatever_here/nvme1n1
but this is they way I would like to implement it.
All cases handled and works correctly for me, lib tests passed too. This is a draft, I'm looking for approval :)
Hi @tasleson, Please take a look and let me know what do you think about implementing it this way. The change is not complete yet, I need to add some tests, and lock possibility to use
locate=/whatever_here/nvme1n1
but this is they way I would like to implement it. All cases handled and works correctly for me, lib tests passed too. This is a draft, I'm looking for approval :)
My initial simplistic review is that this change seems OK. However, after I fetched your branch and re-based it I tried running it on a couple of different systems and this latest change breaks our Dell system which doesn't support slots API, we simply get:
# ./ledctl locate=/dev/nvme20n1
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
ledctl: /dev/nvme20n1: device not supported
ledctl: IBPI LOCATE: missing block device(s)... pattern ignored.
I'll try to look into why this is broken. Our SES system ran make check
fine.
@mtkaczyk I located the bug in your current proposal:
bool is_virt_nvme(const char * const name)
{
/* Simple match by name */
if (strncmp(name, "nvme", 4) == 0 && name[5] == 'c')
return true;
return false;
}
This function doesn't work when the number of NVMe devices gets into double digits or more, eg. nvme15c15n1
fails to be recognized as a virtual NVMe device as the c
is in index 6, not 5.
This function doesn't work when the number of NVMe devices gets into double digits or more, eg.
nvme15c15n1
fails to be recognized as a virtual NVMe device as thec
is in index 6, not 5.
Yes, of course matching by specific position is bad approach and bug prone. Thanks for catching this! I will replace this by strchr()
FYI all interested, I made a general clean up int ledctl tests #190 . I need to wrote few tests for --set-slot and --get-slot by device and test this with MP drives. WIP
@tasleson here my proposal.
It works on my setup for nvme and not nvme drives:
I works with AHCI too, we don't have slots for SATA, so I checked it manually: