intel / ledmon

Enclosure LED Utilities
GNU General Public License v2.0
73 stars 47 forks source link

SES slot get/set/list support and bug fixes #109

Closed tasleson closed 1 year ago

tasleson commented 2 years ago

This PR is a work in progress, posting early to get some feedback before I spend more time on it. I've found some minor issues when working on this change which are included at the beginning of the change set. However, in my test environment I'm finding it functional.

Some open questions and areas that need a bit of work:

  1. When a SES slot has both the locate and fail LEDs illuminated, what is the IBPI pattern?
  2. I'm thinking the code added to retrieve the LED status is not totally correct, need to read more SES documentation.
  3. I have an enclosure that I believe doesn't have any trays (system is sitting in a remote lab). sg_ses reports a check condition when I try to get/set a slot, ledctl operates like it was successful. I need to dig into this.
  4. I believe it would be beneficial if the slot specifier was persistent, that it would survive reboots and dynamic reconfiguration. I believe this could be attained with SES by using an identifier in the enclosure that persists?
  5. ledctl has no knowledge of multi-path, thus if you don't specify -x when doing something like ./ledctl locate=/dev/sdk the code clears LEDs of every device twice. I'll write up an issue on this.

Sample output of two of the enclosures I have attached to a test system.

  # ./ledctl --list-slots --controller SCSI 
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
slot: /dev/sg43/0     led state: LOCATE          device: /dev/sdw       
slot: /dev/sg43/1     led state: FAILURE         device: /dev/sdx       
slot: /dev/sg43/2     led state: NORMAL          device: /dev/sdy       
slot: /dev/sg43/3     led state: NORMAL          device: /dev/sdz       
slot: /dev/sg43/4     led state: LOCATE          device: /dev/sdaa      
slot: /dev/sg43/5     led state: NORMAL          device: /dev/sdab      
slot: /dev/sg43/6     led state: NORMAL          device: /dev/sdac      
slot: /dev/sg43/7     led state: NORMAL          device: /dev/sdad      
slot: /dev/sg43/8     led state: NORMAL          device: /dev/sdae      
slot: /dev/sg43/9     led state: NORMAL          device: /dev/sdaf      
slot: /dev/sg43/10    led state: NORMAL          device: /dev/sdag      
slot: /dev/sg43/11    led state: NORMAL          device: /dev/sdah      
slot: /dev/sg43/12    led state: NORMAL          device: /dev/sdai      
slot: /dev/sg43/13    led state: NORMAL          device: /dev/sdaj      
slot: /dev/sg43/14    led state: NORMAL          device: /dev/sdak      
slot: /dev/sg43/15    led state: NORMAL          device: /dev/sdal      
slot: /dev/sg48/0     led state: NORMAL          device: /dev/sdam      
slot: /dev/sg48/1     led state: NORMAL          device: /dev/sdan      
slot: /dev/sg48/2     led state: NORMAL          device:                
slot: /dev/sg48/3     led state: NORMAL          device:                
slot: /dev/sg48/4     led state: NORMAL          device:                
slot: /dev/sg48/5     led state: NORMAL          device:                
slot: /dev/sg48/6     led state: FAILURE         device:                
slot: /dev/sg48/7     led state: LOCATE          device:                
slot: /dev/sg48/8     led state: NORMAL          device:                
slot: /dev/sg48/9     led state: NORMAL          device:                
slot: /dev/sg48/10    led state: NORMAL          device:                
slot: /dev/sg48/11    led state: NORMAL          device:                
slot: /dev/sg48/12    led state: NORMAL          device: /dev/sdao      
slot: /dev/sg48/13    led state: NORMAL          device: /dev/sdap      
slot: /dev/sg48/14    led state: NORMAL          device:                
slot: /dev/sg48/15    led state: NORMAL          device:                     
...
mtkaczyk commented 2 years ago

When a SES slot has both the locate and fail LEDs illuminated, what is the IBPI pattern?

Is it "3 LEDs per slot" system?

I believe it would be beneficial if the slot specifier was persistent, that it would survive reboots and dynamic reconfiguration. I believe this could be attained with SES by using an identifier in the enclosure that persists?

I cannot help you with that. We are not using SES configurations now.

ledctl has no knowledge of multi-path, thus if you don't specify -x when doing something like ./ledctl locate=/dev/sdk the code clears LEDs of every device twice. I'll write up an issue on this.

Yes, we are not supporting multipath right now. Ledctl should do only what is intended, so we are going to make -x as default. That should resolve issue.

I will review your changes later this week.

Thanks, Mariusz

tasleson commented 1 year ago

@mtkaczyk Thanks for the review, I'll re-work the PR.

tasleson commented 1 year ago

When a SES slot has both the locate and fail LEDs illuminated, what is the IBPI pattern?

Is it "3 LEDs per slot" system?

The ses specification has separate bits for locate and failed, and they can be independently set.

tasleson commented 1 year ago

@mtkaczyk I believe I've made the requested changes. Please take a look and see what you think, thanks!

mtkaczyk commented 1 year ago

When a SES slot has both the locate and fail LEDs illuminated, what is the IBPI pattern?

Is it "3 LEDs per slot" system?

The ses specification has separate bits for locate and failed, and they can be independently set.

Probably we don't care of this case yet. If drive in this slot was failed then we were unable to clear led from ledmon or ledctl context because we referred to block device which was gone. Situation has been changed when slot management was added. This is a new problem now, so we need adopt our solution to support multi-led controllers i.e. when many states can be set at once. According to specification NPEM can set every state concurrently, so we shouldn't limit our implementation to 2 states only.

tasleson commented 1 year ago

@mtkaczyk I would like to declare that the JSON output in PR is tentative and should not be declared final yet as we may need to make some changes to support the notion of persistent identifiers.

mtkaczyk commented 1 year ago

@mtkaczyk I would like to declare that the JSON output in PR is tentative and should not be declared final yet as we may need to make some changes to support the notion of persistent identifiers.

OK, so if we can move json support to new PR that will be perfect. Can you handle this? For SCSI/SES part you have my approve :)

tasleson commented 1 year ago

@mtkaczyk I would like to declare that the JSON output in PR is tentative and should not be declared final yet as we may need to make some changes to support the notion of persistent identifiers.

OK, so if we can move json support to new PR that will be perfect. Can you handle this?

JSON change removed from this PR.

For SCSI/SES part you have my approve :)

Great

tasleson commented 1 year ago

@mtkaczyk

https://github.com/intel/ledmon/pull/109#discussion_r1007755668

_enclosure_get_slot updated to take device parameter.

mtkaczyk commented 1 year ago

@tasleson please see #106 We got last approval- change will be merged soon.

tasleson commented 1 year ago

@tasleson please see #106 We got last approval- change will be merged soon.

This is great news. However, I would still like to get this change merged too :smile:

tasleson commented 1 year ago

@mtkaczyk @mgrzonka Where are we at with regards to this? Thanks

mtkaczyk commented 1 year ago

Hello, Sorry for late response, we had some day-offs in Poland this week :) I will review your pull last time. Next week we will run some static code analysis and our internal testing scope eventually (it we determine that there is a risk of regression). So, we will probably ask for some minor soon. After that change will be merged. I think that we will be able to close this in 3 weeks.

Mariusz

mtkaczyk commented 1 year ago

Could you please rebase your change to latest upstream? There are conflicts with statuses changed by last pull merged.

Thanks!

tasleson commented 1 year ago

Could you please rebase your change to latest upstream? There are conflicts with statuses changed by last pull merged.

Thanks!

Done

mgrzonka commented 1 year ago

I haven't found any issues.

mgrzonka commented 1 year ago

@tasleson I'll additionally schedule some internal testing to ensure there aren't any regressions.

tasleson commented 1 year ago

Re-based