intel / ledmon

Enclosure LED Utilities
GNU General Public License v2.0
72 stars 44 forks source link

ledmon/amd: Add a new NVMe LED controlling method #193

Closed qiyuzhu2 closed 3 months ago

qiyuzhu2 commented 6 months ago

The purpose of this patch is to make AMD's NVMe led controlling methods more diverse and meet the secondary development needs of all customers as much as possible. The way of original ledmon to control NVMe led is: At present, there is only "out-of-band" NVMe control method, this method also requires maintaining a mapping table and customers want to control NVMe leds in a way similar to Intel VPP. For patch, a method is also proposed to control the PCIe slot crtl register directly to control the NVMe Led. In this way, the customer does not have the above limitation of controlling the NVMe led through IPMI.

mtkaczyk commented 6 months ago

hello @qiyuzhu2, Thanks for pull, glad to hear that slot's management feature is good and wanted by AMD customers. We are working on release now #186 so please be aware that merging will be done in Feb-March.

@nfont could you please review these changes from AMD perspective first?

qiyuzhu2 commented 6 months ago

Hi Nathan, This is already mentioned in the description of patch: 1) Customer want a NVMe led controlling method like INTEL vpp; 2) The AMD’s existing NVMe led control method is to go OOB(CPU-> ipmi -> BMC ->CPLD) ,

Unfortunately, customer needs to maintain an i2c mapping table on BMC to map the i2c address of CPLD to the NVMe slot nr, this table is difficult to maintain because of hot plugging of disks, the new patch’s i2c mapping table is maintained by the fw.

Qiyu

@@ -38,6 +38,8 @@ enum amd_led_interfaces {

    AMD_INTF_UNSET,

    AMD_INTF_SGPIO,

    AMD_INTF_IPMI,

It would be nice to expand this comment to explain why there is less dependence.

qiyuzhu2 commented 6 months ago

Hi Nathon,

   YES, your are right,  it’s an offset to the PCIe config space.
   The reason I don't use “xxx_offset” to name the variable here is from the point of view that each cap is an independent structure.

BRs Qiyu

qiyuzhu2 commented 6 months ago

Hi Nathan,

   I don't know if I understand what you mean, but the real logic is:

(1) If the customer do know the platform supports the PCIe(NVMe ) slot cap, he can use the “--amd-nvme-slot” option to control the NVMe led, not other led, and in patch, it will first to see if it is an nvme controller and then to check the slot cap register.

(2) If the customer do not know the platform supports PCIe(NVMe ) slot cap, he can not use the “--amd-nvme-slot” option, all thing keep default.

My view is that we need to be conservative here, instead of everything being done without the customer's awareness, which may easily cause confusion and problems. Perhaps it also should not be possible to control two different types of leds in one control path.

For example, if the option "amd-nvme-slot" is not available, we add this control method to the known control path. the customer does not know what we have done, but also spends the effort to maintain the i2c mapping table on BMC as I replied in another email.

BRs Qiyu

  • / Universal to nvme led ctl /
  • if (using_amd_nvme_slot) {
  • amd_interface = AMD_INTF_NVME_SLOT;
  • amd_ipmi_platform = AMD_PLATFORM_UNSET;
  • }

I have a question about the introduction of the new command option to specify using the nvme slot capabilities. You provide a function to determine if this is a nvme controller, is_nvme_controller(), that is used in checking for enclosure management enabled and other routines. If the is_nvme_controller() call fails an error will be returned even if the new option is specified.

Is there a reason we can't just use the is_nvme_controller() function to determine if we should use nvme slot control methods instead of requiring a user to specify this with the new option? At this point you could set amd_interface and amd_ipmi_platform based on the call to is_nvme_controller().

mtkaczyk commented 4 months ago

Hello @qiyuzhu2,

(1) If the customer do know the platform supports the PCIe(NVMe ) slot cap, he can use the “--amd-nvme-slot” option to control the NVMe led, not other led, and in patch, it will first to see if it is an nvme controller and then to check the slot cap register.

(2) If the customer do not know the platform supports PCIe(NVMe ) slot cap, he can not use the “--amd-nvme-slot” option, all thing keep default.

My design for slot interface was to keep it simple and reusable without additional params needed. I don't see a need for parameter in this case. Customer should choose controller by --controller=amd or amd_ipmi (I'm not well familiar with AMD implementation). I cannot agree for --amd-nvme-slot param. For example, for other controllers list-slots command works this way:

ledctl --list--slots --controller-type=npem
ledctl --list--slots --controller-type=vmd
ledctl --list--slots --controller-type=scsi

AMD interface must stay same.

My view is that we need to be conservative here, instead of everything being done without the customer's awareness, which may easily cause confusion and problems. Perhaps it also should not be possible to control two different types of leds in one control path.

slots interface is made to complete only requested command, --list-slots and --get-slot is used only to read data (no risk). --set-slot updates state only slot referenced by --device or --slot. I don't see a risk for customer.

If you need to filter amd ipmi and amd sgpio them you should use these as the controller types. It is fine:

ledctl --list--slots --controller-type=amd_sgpio
ledctl --list--slots --controller-type=amd_ipmi

For example, if the option "amd-nvme-slot" is not available, we add this control method to the known control path. the customer does not know what we have done, but also spends the effort to maintain the i2c mapping table on BMC as I replied in another email.

I will not comment this because I'm not familiar with AMD protocols . I just cannot accept --amd-nvme-slot because it doesn't fit to slot feature design. Please consider more detailed controller types.

mtkaczyk commented 4 months ago

Hey @qiyuzhu2 Do you plan to rework this pull request? Just let me know because I would like to know if I should keep it open.

qiyuzhu2 commented 3 months ago

AMD will discuss the code internally and then resubmit the PR.

mtkaczyk commented 3 months ago

Thanks, closing for now.