intel / ledmon

Enclosure LED Utilities
GNU General Public License v2.0
71 stars 48 forks source link

Gauging interest in a C library API #97

Closed tasleson closed 1 year ago

tasleson commented 2 years ago

Has there been any interest in a C library API which provides the ability to list/get/set etc. individual LEDs? Would folks be amenable to integrating a change which adds this if one was supplied?

mtkaczyk commented 2 years ago

I'm not aware of interest in this area. ledctl will provide this functionality now (via #94).

We met problems around ledctl and ledmon namespaces isolation, making core led management part as library could be useful , all standard methods will be part of the lib api. I'm open on that, especially if you can give me an example of other monitor app which (I believe) you want to develop using the library. The change could make ledmon project more flexible, useful and popular but it will require to modify project structure, probably name (our main product will be the library, not the "ledmon" and all other tools will be just a library consumers).

It will bring a lot of harm but I consider it as a good chance to increase project visibility :)

tasleson commented 2 years ago

I'm open on that, especially if you can give me an example of other monitor app which (I believe) you want to develop using the library.

Thank you for being open to this possibility.

The ceph project is currently using libStorageMgmt python bindings for LED management. LibStorageMgmt only supports SES. IMHO it would be better to refactor a C API into ledmon code base than expand libStorageMgmt to cover more LED hardware.

mtkaczyk commented 2 years ago

I did a research small research. As a start point we should create static library. It should be separate target in automake. Then, we will try to propagate ledmon-devel package in distros.

Does it fit your needs?

mtkaczyk commented 2 years ago

We need to have library API draft too. Will be great if you can share it with us before implementation start. We need to discus shared library content together.

I leave the issue open for now.

Thanks, Mariusz

tasleson commented 2 years ago

@mtkaczyk Sorry for the slow response, I was out of office.

As a start point we should create static library. It should be separate target in automake. Then, we will try to propagate ledmon-devel package in distros.

Any specific reason for static vs. dynamic? Fedora's guidelines on static https://docs.fedoraproject.org/en-US/packaging-guidelines/#packaging-static-libraries

We need to have library API draft too.

Yes, the API interface itself is quite important. I'll start the discussion with what we currently have in libStorageMgmt, but please note that this was designed around SES.

int lsm_local_disk_ident_led_on(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_ident_led_off(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_fault_led_on(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_fault_led_off(const char *disk_path, lsm_error **lsm_err);

Notes: The disk_path is the device node. The library has a opaque error type with the following methods:

lsm_error_number LSM_DLL_EXPORT lsm_error_number_get(lsm_error_ptr e);
char LSM_DLL_EXPORT *lsm_error_message_get(lsm_error_ptr e);
char LSM_DLL_EXPORT *lsm_error_exception_get(lsm_error_ptr e);
char LSM_DLL_EXPORT *lsm_error_debug_get(lsm_error_ptr e);
void LSM_DLL_EXPORT *lsm_error_debug_data_get(lsm_error_ptr e, uint32_t *size);

Some of the initial discussion questions with this would be:

  1. How do we want to handle 2-LEDs systems and 3-LEDs in the same API?
  2. Do we want to have an opaque error type to allow returning detailed error information, or use an error code only?
  3. Should the API be idempotent?

Possible API additions:

  1. Function to enumerate all controllers
  2. Function to enumerate all slots for a given controller with associated device node (if present)

One of the enhancement requests for libStorageMgmt is when a device stops responding. In this case the kernel may remove the device which causes the device node to get removed. When this occurs we don't have a good way of correlating where the device was to illuminate the associated LED. It's similar to this issue: https://github.com/intel/ledmon/issues/81

In general I try to use https://git.kernel.org/pub/scm/linux/kernel/git/kay/libabc.git when crafting a library interface. It would be great to keep the API as simple as possible.

I'll spend some time this week investigating and propose an API for discussion. I'll likely have questions.

Thanks

mtkaczyk commented 2 years ago

Any specific reason for static vs. dynamic? Fedora's guidelines on static https://docs.fedoraproject.org/en-US/packaging-guidelines/#packaging-static-libraries

Static library is more comfortable for us because -devel package is optional. Dynamic will require more changes in code and in packages (ledmon and ledctl will require new lib dependency). -devel package will be necessary only to build ledmon. I want to avoid maintenance hell because beside developing we are responsible for tracking packages in distros. I would like to make the change as simple as possible and give distributions time to adopt it.

Yes, the API interface itself is quite important. I'll start the discussion with what we currently have in libStorageMgmt, but please note that this was designed around SES.

int lsm_local_disk_ident_led_on(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_ident_led_off(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_fault_led_on(const char *disk_path, lsm_error **lsm_err);
int lsm_local_disk_fault_led_off(const char *disk_path, lsm_error **lsm_err);

Notes: The disk_path is the device node.

Ok, we can stay with devnode, that is similar to ledctl implementation. I can see something like:

enum ibpi_pattern {
    IBPI_PATTERN_NORMAL,
    IBPI_PATTERN_ONESHOT_NORMAL,
    IBPI_PATTERN_DEGRADED,
    IBPI_PATTERN_HOTSPARE,
    IBPI_PATTERN_REBUILD,
    IBPI_PATTERN_FAILED_ARRAY,
    IBPI_PATTERN_PFA,
    IBPI_PATTERN_FAILED_DRIVE,
    IBPI_PATTERN_LOCATE,
    IBPI_PATTERN_LOCATE_OFF,
    IBPI_PATTERN_ADDED,
    IBPI_PATTERN_REMOVED,
};

enum led_status{
    STATUS_OK = 0,
        STATUS_GENERAL_ERROR,
       /* and more detailed errors comes here */
};

enum led_status set_led(const char *disk_path, enum ibpi_pattern ibpi);
bool is_led_supported(const char *disk_path);

struct dev_properties{
    /* content here, I'm not sure what could be needed */
};

enum led_status get_properties(const char *disk_path, struct dev_properties *props);

Such minimal interface, should be sufficient for start. The question here is if we need something else (like access to internal ledmon properties, lists, etc.)

  • How do we want to handle 2-LEDs systems and 3-LEDs in the same API?

Yes, we should export IBPI_STATE enum and caller should use those states. The hardware interpretation is on controller side so LED system shoudn't matter.

  • Do we want to have an opaque error type to allow returning detailed error information, or use an error code only?

I my opinion meaningful error code is enough. We should be able to determine it by error code. Ledmon is simple app. However if you opinion is different and you an see show me example where it will be useful, I'm ready to change my opinion.

  • Should the API be idempotent?

I'm not sure what it means, could you elaborate more? I want to be on same page.

Possible API additions:

  1. Function to enumerate all controllers

I'm not sure if it is necessary for library. Basically, you will have your set of drives, just verify if they are supported first and then blink. Our lib should determine controller automatically no need to export this internal property now.

  1. Function to enumerate all slots for a given controller with associated device node (if present)

Our new blinking via slot support. For now it is out of scope, because our changes are not merged. We don't need to discuss yet. Do you agree?

One of the enhancement requests for libStorageMgmt is when a device stops responding. In this case the kernel may remove the device which causes the device node to get removed. When this occurs we don't have a good way of correlating where the device was to illuminate the associated LED. It's similar to this issue: #81

And this gap will be addressed soon in ledctl and later, we will add support for that in lib. So as above, it is worth to have it on roadmap but it is to early to consider it. Let us test and merge pull first :)

In general I try to use https://git.kernel.org/pub/scm/linux/kernel/git/kay/libabc.git when crafting a library interface. It would be great to keep the API as simple as possible.

I'll spend some time this week investigating and propose an API for discussion. I'll likely have questions.

I will be please to help you but please note that this is my first experience with library so I'm not an expert. I will invite more Intel folk to participate in this discussion later.

tasleson commented 2 years ago

Ok, we can stay with devnode, that is similar to ledctl implementation. I can see something like:


enum ibpi_pattern {
  IBPI_PATTERN_NORMAL,
  IBPI_PATTERN_ONESHOT_NORMAL,
  IBPI_PATTERN_DEGRADED,
  IBPI_PATTERN_HOTSPARE,
  IBPI_PATTERN_REBUILD,
  IBPI_PATTERN_FAILED_ARRAY,
  IBPI_PATTERN_PFA,
  IBPI_PATTERN_FAILED_DRIVE,
  IBPI_PATTERN_LOCATE,
  IBPI_PATTERN_LOCATE_OFF,
  IBPI_PATTERN_ADDED,
  IBPI_PATTERN_REMOVED,
};

enum led_status{
  STATUS_OK = 0,
        STATUS_GENERAL_ERROR,
       /* and more detailed errors comes here */
};

enum led_status set_led(const char *disk_path, enum ibpi_pattern ibpi);
bool is_led_supported(const char *disk_path);

As C doesn't have namespaces, it's common to prefix all the library public parts with a consistent prefix, eg. led.

bool led_is_supported(const char *disk_path);
enum led_status_set(const char *disk_path, enum ibpi_pattern ibpi);

struct dev_properties{ / content here, I'm not sure what could be needed / };

We shouldn't expose the structure internals, keep them opaque and provide functions to extract data from them. This allows the structure to change over time internally. You can always add functions for the opaque type, you can never remove.

enum led_status get_properties(const char disk_path, struct dev_properties props);



Such minimal interface, should be sufficient for start. The question here is if we need something else (like access to internal ledmon properties, lists, etc.)

I don't think that would be needed.

Our new blinking via slot support. For now it is out of scope, because our changes are not merged. We don't need to discuss yet. Do you agree?

Agreed

  • Should the API be idempotent?

I'm not sure what it means, could you elaborate more? I want to be on same page.

Basically this would mean you could repeatedly call the same function with the same values to set an LED and it would not be an error if the LED was already in that state, ref. https://en.wikipedia.org/wiki/Idempotence for more. I think this would be desired.

Also I also think it would be useful for the API to have a generic context that the caller would instantiate first and it would then be the first parameter in eacy API call. This provides a way to associate arbitrary data, locks etc. with the library caller. This would allow us to provide an interface to allow the API user to register file descriptors for stdout, stderr, set log levels etc. Thus ledctl could simply pass the default FDs and log level and get the same output they have today.

mtkaczyk commented 2 years ago

Basically this would mean you could repeatedly call the same function with the same values to set an LED and it would not be an error if the LED was already in that state, ref. https://en.wikipedia.org/wiki/Idempotence for more. I think this would be desired.

If we want to return error, we should consider adding led_get() method but it is under review now ( in slot's support). We don't have such interface yet, so some controllers may verify current state and return if requested is the same, others don't. We should return success it such case and it shouldn't be blocked. I agree.

Also I also think it would be useful for the API to have a generic context that the caller would instantiate first and it would then be the first parameter in eacy API call. This provides a way to associate arbitrary data, locks etc. with the library caller. This would allow us to provide an interface to allow the API user to register file descriptors for stdout, stderr, set log levels etc. Thus ledctl could simply pass the default FDs and log level and get the same output they have today.

Yes, agree. We should allow caller to control global properties ( seestruct ledmon_conf). So user should be able to use set:

Others are ledmon parameters. We need to divide current struct ledmon_conf into global and ledmon structs. We should export led_read_config() to give external caller api to read config.

On led_initilize() we need to call sysfs_scan(). Perhaps, it could be useful for caller to get access to sysfs_block_list- the list contains all supported devices, but also, in simplified version, not all properties are needed.

Definitely, we need to initialize context.

tasleson commented 2 years ago

One thing I forgot about is library licensing. I see that the existing code base is GPLv2. If the library is GPLv2, then it limits the library linking to only GPLv2 code. One of the use cases I was thinking about is ceph to use this new library, but ceph is a combination of LGPL, MIT, GPLv3.

I'm thinking that perhaps the best way would be to allow a "GPL linking exception". What are peoples thoughts on this?

I'm not sure how feasible this is as I believe we would need every contributor to be OK with a change like this.

mtkaczyk commented 2 years ago

@tasleson I did research and I think that we are able to change the license to LGPL. We will create the pull request with the change.

tasleson commented 2 years ago

@tasleson I did research and I think that we are able to change the license to LGPL. We will create the pull request with the change.

Any updates? Thanks

bkucman commented 2 years ago

@tasleson We are in process of reviewing the change internally. Slight changes in code structure must be done alongside license change.

After internal processing, the pull request will have to get approval from all code contributors, so, it's definitely gonna be a while longer.

mtkaczyk commented 2 years ago

106 submitted.

mtkaczyk commented 2 years ago

@tasleson unfortunately it seems to be impossible to contact one contributor (I hope that Dell will answer soon). He is an onwer of part of smp code and some minor fixes in other stack. If we want to proceed with changing licensing, those parts needs to be rewritten or we need to exclude smp from library.

tasleson commented 2 years ago

@tasleson unfortunately it seems to be impossible to contact one contributor (I hope that Dell will answer soon). He is an onwer of part of smp code and some minor fixes in other stack. If we want to proceed with changing licensing, those parts needs to be rewritten or we need to exclude smp from library.

Well let's hope we hear back on this. It would really be great to not lose functionality.

If we are unable to change license, perhaps we should consider adding structured output (JSON?) instead from the ledctl tool, to allow easier re-use by other frameworks?

mtkaczyk commented 2 years ago

Yes, we will accept it but it must be configurable, disabled by default- backward compatibility reasons.

mtkaczyk commented 2 years ago

I'm closing this ticket now. Unfortunately, we need to drop library functionality idea due to license problem.

mtkaczyk commented 1 year ago

@tasleson do you plan to implement library API? We updated license to make that possible.

tasleson commented 1 year ago

@tasleson do you plan to implement library API? We updated license to make that possible.

I started working on it last week.