intel / ledmon

Enclosure LED Utilities
GNU General Public License v2.0
73 stars 47 forks source link

tests/runtests.sh failure when testing SES #201

Closed tasleson closed 7 months ago

tasleson commented 8 months ago

When using the latest commit 3cbc9c9a4fa69997d0d9b8b929ec72be2bc5d099 and running tests/runtests.sh I run into the following failures

@pytest.mark.parametrize("cntrl", LedctlCmd.slot_mgmt_ctrls)
    def test_ibpi(ledctl_binary, slot_filters, controller_filters, cntrl):
        """
        Test setting the led status by using IBPI syntax for the disk under the chosen controller.
        Limited to controllers with slots feature support.
        """

        cmd = LedctlCmd(ledctl_binary, slot_filters, controller_filters)
        slots_with_device_node = get_slots_with_device_or_skip(cmd, cntrl)

        for slot in slots_with_device_node:
            for state in LedctlCmd.base_states:
                cmd.set_ibpi(slot.device_node, state)
                cur = cmd.get_slot(slot)
>               assert cur.state == state,\
                    f"unable to set \"{slot.device_node}\" to \"{state}\", current = \"{cur}\" using ibpi syntax"
E               AssertionError: unable to set "/dev/sdj" to "rebuild", current = "slot: sg18-0 device: /dev/sdj" using ibpi syntax
E               assert 'normal' == 'rebuild'
E                 - rebuild
E                 + normal

tests/ledctl/slot_test.py:44: AssertionError
 @pytest.mark.parametrize("cntrl", LedctlCmd.slot_mgmt_ctrls)
    def test_set_slot_by_slot(ledctl_binary, slot_filters, controller_filters,
                              cntrl):
        """
        Test that we can set slots to different states and verify that they reported a change, using --slot.
        """

        cmd = LedctlCmd(ledctl_binary, slot_filters, controller_filters)
        try:
            slots = [s for s in cmd.get_slots(cntrl)]
        except AssertionError as e:
            pytest.skip(str(e))

        if len(slots) == 0:
            pytest.skip("No slot found")

        for slot in slots:
            for state in LedctlCmd.base_states:
                cmd.set_slot_state(slot, state)
                cur = cmd.get_slot(slot)
>               assert cur.state == state,\
                    f"unable to set \"{slot}\" to \"{state}\", current = \"{cur}\" using slot"
E               AssertionError: unable to set "slot: sg18-0 device: /dev/sdj" to "rebuild", current = "slot: sg18-0 device: /dev/sdj" using slot
E               assert 'normal' == 'rebuild'
E                 - rebuild
E                 + normal

tests/ledctl/slot_test.py:68: AssertionError
  def slot_set_and_get_by_device_all(cmd: LedctlCmd, slot):
        for state in LedctlCmd.base_states:
            cmd.set_device_state(slot, state)
            cur = cmd.get_slot_by_device(slot)
>           assert cur.state == state,\
                f"unable to set \"{slot}\" to \"{state}\", current = \"{cur}\" using --device"
E           AssertionError: unable to set "slot: sg18-0 device: /dev/sdj" to "rebuild", current = "slot: sg18-0 device: /dev/sdj" using --device
E           assert 'normal' == 'rebuild'
E             - rebuild
E             + normal

tests/ledctl/slot_test.py:76: AssertionError

which are caused by SES not handling "rebuild". If I remove the "rebuild" from the following, it passes.

# These base states should be supported by all controllers
 base_states = ["failure", "locate", "normal", "rebuild"]

What should the LED status/behavior be for a single LED? I'm thinking we need to add a translation in somewhere from rebuild to -> locate or failure?

pawpiatko commented 7 months ago

Hello tasleson, could you please rerun this test after replacing rebuild to ses_rebuild?

base_states = ["failure", "locate", "normal", "ses_rebuild"]

tasleson commented 7 months ago

Hello tasleson, could you please rerun this test after replacing rebuild to ses_rebuild?

base_states = ["failure", "locate", "normal", "ses_rebuild"]

diff --git a/tests/ledctl/ledctl_cmd.py b/tests/ledctl/ledctl_cmd.py
index 41594bb..7582d5d 100644
--- a/tests/ledctl/ledctl_cmd.py
+++ b/tests/ledctl/ledctl_cmd.py
@@ -29,7 +29,7 @@ class LedctlCmd:
     slot_mgmt_ctrls = ["SCSI", "VMD", "NPEM"]

     # These base states should be supported by all controllers
-    base_states = ["failure", "locate", "normal", "rebuild"]
+    base_states = ["failure", "locate", "normal", "ses_rebuild"]

     def __init__(self,
                  ledctl_bin=["none"],
    def slot_set_and_get_by_device_all(cmd: LedctlCmd, slot):
        for state in LedctlCmd.base_states:
            cmd.set_device_state(slot, state)
            cur = cmd.get_slot_by_device(slot)
>           assert cur.state == state,\
                f"unable to set \"{slot}\" to \"{state}\", current = \"{cur}\" using --device"
E           AssertionError: unable to set "slot: sg42-0 device: /dev/sdw" to "ses_rebuild", current = "slot: sg42-0 device: /dev/sdw" using --device
E           assert 'normal' == 'ses_rebuild'
E             - ses_rebuild
E             + normal

tests/ledctl/slot_test.py:76: AssertionError
tasleson commented 7 months ago

From looking at the IBPI specification I'm wondering if it makes sense to include the states that have a frequency involved with them in the API? Is the hardware suppose to provide the blinking or the software controlling them? If the hardware doesn't have support, then it would seem to make sense that the library API portion of ledmon simply provides the ability to turn the locate/fail LEDs off or on and leave the IBPI states that needed on and off with some frequency to users of the library, like ledmon daemon?

mtkaczyk commented 7 months ago

Hey @tasleson, sorry for bringing the regression. It added it this way because those states are mandatory in NPEM spec and works for VMD too. I didn't take SCSI into consideration. Well, I even put comment that those states "should be supported". Feel free to just remove rebuild from test. It should be fine for now.

First, I don't know why we have to export ses specific patterns. I think that IBPI pattern list is enough and there is no need for them, I see comment:

    /* Below are SES-2 codes. Note that by default most IBPI messages are
     * translated into SES when needed but SES codes can be added also. */
    LED_SES_REQ_ABORT,

and I think that we can just remove them, we don't need them in API. We can clear up manual too (we can keep mapping internally). We are not going to add NPEM patterns, it should be same for SES. Comment here?

From looking at the IBPI specification I'm wondering if it makes sense to include the states that have a frequency involved with them in the API? Is the hardware suppose to provide the blinking or the software controlling them? If the hardware doesn't have support, then it would seem to make sense that the library API portion of ledmon simply provides the ability to turn the locate/fail LEDs off or on and leave the IBPI states that needed on and off with some frequency to users of the library, like ledmon daemon?

Honestly I don't know. As not experienced user I would like to just have a possibility to impose something which is commonly understanded as "rebuild", even if it is hardcoded inside our library. Generally user is not interested in implementation details, result matters. I see it as not a problem, maybe someone will find it useful. It is not big deal I think. In my opinion exporting SES patterns is more controversial because ledctl should keep user interface for every controller unified.

Anyway, going back to the problem.. We should consider adding API to get supported patterns list to avoid problems like this one. First we should ask controller to return list of supported patterns (hardcoded inside or retrieved from hardware) to test only patterns which are really testable. That seems to be best solution. What do you think?

mtkaczyk commented 7 months ago

@tasleson is the root cause of issue that we cannot read "rebuild" status from LED? I mean that it blinks "rebuild" pattern but there is not way to read from userspace if "rebuild" is blinking. If yes then this one is really tricky.

For sure, we can just remove this controversial pattern from testing to make it working for you.

tasleson commented 7 months ago

@tasleson is the root cause of issue that we cannot read "rebuild" status from LED? I mean that it blinks "rebuild" pattern but there is not way to read from userspace if "rebuild" is blinking. If yes then this one is really tricky.

For sure, we can just remove this controversial pattern from testing to make it working for you.

Yes, we could translate the rebuild to the correct for SES, but we would have no way to return that to the user if they queried. We could change the test case to accept rebuild or the equivalent value(s).

I'm experimenting with this a bit more. I have a camera in front of one of the JBODs (recent add) to remote view the LEDs and I've learned that for our specific hardware, the locate LED blinks. ~I follow-up some more on this tomorrow.~

The SES hardware I have access too, has 2 LEDs per slot. One of them is the ident/fault LED, the other is the activity. The ident turn on, the LED blinks green. If you set FAULT, the LED turns red. If both ident & fault are set the LED blinks red. I'll try to come up with a PR that can handle this test a bit better. I'll also see if we can reduce/remove the SES enumerations from the API.