Closed tasleson closed 1 year ago
@tasleson, now I looked into that and I see that you wrote: "that PR should be merged first."
Should I revert SES support then?
@mtkaczyk I'll fix this PR, no need to revert SES PR.
Hi, I will review that all commits at once. It will hard to review each patch separately. Feel free to push next patches on top of this- I will squash it all to one commit at the end. There is no need to push-force. I hope that it will help you.
Thanks, Mariusz
@tasleson I would like to present you my point of view of the library implementation. First, I would like to illustrate you how I understand conception of controllers, slots and disks and how they are connected. I created an illustration to visualize led management complexity:
Legend: | TYPE | DEFINITION |
---|---|---|
CTRL | controller | |
DISK | SSD/NVME/HDD device connected (called as block device) | |
LEDS | led system per disk | |
SLOT | the pointer/ID of led system inside controller. |
As you can see, there is no direct reference between CTRL and DISK but you can use DISK to retrieve slot identifier for a controller to blink appropriative led.
There isn't any functional difference between set_slot
and _write
. Some implementation defined behaviors like:
may be different but the concept is same. If you want to update led for device using particular controller you need to determine slot. You can send just block device (if any) and slot will be detected automatically for you.
I also determined two problematic scenarios:
block_get_controller()
That is why for library I want to force user to determine controller for device. We can provide something like get_default_controller()
but I want to include controller in _set
or _get
command:
For me, that is all what library should provide. We should allow user to just set LED for proper slot. Other functionalities are ledmon and ledctl related. I've spent some time on diagram to illustrate my expectations somehow:
The library implementation goals I determined:
strcut block_device
and put in in the struct controller
(because the function is controller related).add
and remove
functions)
_determine()
logic should stay in ledmon/ledctlI understand that the concept I provided will require more changes. We should separate library content and ledmon/ledctl content more. Let me know what do you think.
I understand that the concept I provided will require more changes. We should separate library content and ledmon/ledctl content more. Let me know what do you think.
In my opinion we need to make the library as simple as possible for someone to use. If there is complexity, it should be in the library implementation, not pushed to the application developer. When I first approached the project I mentioned a use case where you supply a device node and a pattern, discussed here: https://github.com/intel/ledmon/issues/97#issuecomment-1123796793 . Without that functionality I cannot simply drop it into libStorageMgmt as a replacement. This is a use case which any developer should be able to utilize very quickly. In your latest discussion you outline removing the functionality of the get/set by device node. Is it your intention that we are then removing support for all the others? So only NPEM, SES, VMD would be supported?
The notion of a controller and slot is useful for changing the state of LEDs when the block device no longer has a device node. The API should be simple to understand and utilize for this use case too.
You mention:
CTRL1->slot2 and CTRL2->slot0. Points to same LEDS0_0. That is real issue between VMD and NPEM controllers. It is resolved by giving NPEM higher priority, see block_get_controller()
The function block_get_controller
appears to only be used by ledmon. This was part of my question asking if existing ledctl is flawed. I was simply asking if ledctl does what it states it does. Which is having the ability to control LEDs by device node and by the controller/slot functionality.
You also mention:
The _determine() logic should stay in ledmon/ledctl
I believe I actually removed this from ledctl. I'm thinking that maybe I've introduced a bug in doing so.
Maybe the API isn't the best fit for ledctl/ledmon, but I think we should be striving for something that doesn't require the API user to (If I'm understanding your diagram correctly)
Ultimately what I was trying to achieve is something simple, with enough functionality that people would find it useful and have broad hardware support. I also wanted to minimize code churn in the code base, to hopefully reduce the introduction of bugs.
Re-based and moved the commit for removing exclusionary language to beginning of changes. When https://github.com/intel/ledmon/pull/118 gets integrated I'll remove that change from this PR. I've not addressed any of other other comments at this time.
In my opinion we need to make the library as simple as possible for someone to use. If there is complexity, it should be in the library implementation, not pushed to the application developer. When I first approached the project I mentioned a use case where you supply a device node and a pattern, discussed here: #97 (comment) .
Yeah, the initial discussion took place before we developed "get/set-slots" and I deeply realized how it all is connected. The current approach I proposed is based on my current experience. But no offence, we need all to be on the same page to make library implementation possible. I don't want to repeat myself and discuss things many times because my current view is quite different. I did the schema to visualize my mind to myself too.
Without that functionality I cannot simply drop it into libStorageMgmt as a replacement. This is a use case which any developer should be able to utilize very quickly. In your latest discussion you outline removing the functionality of the get/set by device node.
Yup, in my approach I proposed to use opaqued block device
. That makes your implementation complicated. You want to have it as simple as possible. As a maintainer, I'm thinking of making it the most flexible, not the simplest for developer.
See that my approach wants to totally isolate ledctl and ledmon code from library code.
Beside the my personal opinion about the controllers I can see that all current code bases on default one. It seems that I want to create dead api. So, maybe instead giving possibility to choose controller now, we can create set of functions prefixed by _def_
to highlight that they are using default controller and later, based on needs we can export more API around the controller management.
So, to meet my and your expectations, feel free to add endpoint like led_def_set_by_path(chat *path, enum ibpi val)
and appropriative getter. Unfortunately, I still want to isolate ledctl and ledmon code more from the library so I'm wondering how we can agree on that.
Is it your intention that we are then removing support for all the others? So only NPEM, SES, VMD would be supported?
No, but they have _write
only so we can support _set_by_block
and _set_by_path
only. On other endpoints we will be legible to returns error, until appropriative support will be added. Do you agree here?
You mention:
CTRL1->slot2 and CTRL2->slot0. Points to same LEDS0_0. That is real issue between VMD and NPEM controllers. It is resolved by giving NPEM higher priority, see block_get_controller()
The function
block_get_controller
appears to only be used by ledmon. This was part of my question asking if existing ledctl is flawed. I was simply asking if ledctl does what it states it does. Which is having the ability to control LEDs by device node and by the controller/slot functionality.
sysfs_scan()
is shared between ledmon and ledctl. There is call to block_scan
and later to block_device_init
and this function calls block_get_controller
and sets it in device->send_fn
You also mention:
The _determine() logic should stay in ledmon/ledctl
I believe I actually removed this from ledctl. I'm thinking that maybe I've introduced a bug in doing so.
Similar to above, it is called by sysfs_scan
and determine_slaves
.
Maybe the API isn't the best fit for ledctl/ledmon, but I think we should be striving for something that doesn't require the API user to (If I'm understanding your diagram correctly)
- Separately scan controllers and block devices
- Iterate over every block device to find out which controller it should use (default)
- Use the default controller and the block device to see if it has LED support
Agree now, as above- we shuld just put _def
in method.
Ultimately what I was trying to achieve is something simple, with enough functionality that people would find it useful and have broad hardware support. I also wanted to minimize code churn in the code base, to hopefully reduce the introduction of bugs.
Sure and I will support you with that but I also know if I don't handle this and don't force some good requirements from the start then we will end with a huge and understandable library with set of not needed functionalities inside. I know this code well so I'm trying to describe you what is for applications and what is for library.
No worries about bugs, we don't have unit test they are unavoidable. We need to deal with that. Perhaps, we should start with unit tests but it also huge input. There is no simple way now...
functions prefixed by def to highlight that they are using default controller
I'm trying to understand better why we need to expose the idea of default controller. From your perspective, it seems like we must expose it and not handle it internally. Is VMD & NPEM the only one that has the requirement of using NPEM? In this case NPEM is the default yes? Are you wanting the ability for the library user to have the choice to use VMD or NPEM to change LED states instead of allowing the library to select the best one for a given hardware configuration?
In SES with multi-path, there are multiple controllers and paths to change the LED state for a specified slot in a given enclosure, but I wouldn't say one is better than another, unless one of the paths is down.
I'm trying to wrap my head around why ledctl works with all supported hardware when you do something like:
# ledctl normal=/dev/sdr
and the user doesn't need to worry about a controller. I need more of an explanation to why we can't have the library do that too? Thanks
functions prefixed by def to highlight that they are using default controller
I'm trying to understand better why we need to expose the idea of default controller. From your perspective, it seems like we must expose it and not handle it internally. Is VMD & NPEM the only one that has the requirement of using NPEM? In this case NPEM is the default yes? Are you wanting the ability for the library user to have the choice to use VMD or NPEM to change LED states instead of allowing the library to select the best one for a given hardware configuration?
I'm trying to wrap my head around why ledctl works with all supported hardware when you do something like:
# ledctl normal=/dev/sdr
and the user doesn't need to worry about a controller. I need more of an explanation to why we can't have the library do that too? Thanks
Initially I wanted to give possibility to choose the controller explicitly as you said. But during discussion I lowered this requirement because it is not needed now, let me link all my comment:
Beside the my personal opinion about the controllers I can see that all current code bases on default one. It seems that I want to create dead api. So, maybe instead giving possibility to choose controller now, we can create set of functions prefixed by
_def_
to highlight that they are using default controller and later, based on needs we can export more API around the controller management.
Here explanation: We are doing software here. I cannot predict how it all will be connected on hardware level. Someone may come with not trivial case where it will be needed. It comes from "flexibility" approach. It may be never needed but since I've realized that something like that is possible I wanted to make our library API flexible to meet more business goal with minimal input later.
We have allowed/ denied controllers list- that gives some flexibility, we are unable to decide how to blink for particular disk but we can exclude the controller- should be sufficient in standard cases.
When someone will come with this problem that will require to explicitly set the controller we will add new methods to library API. I agree that we don't need to care about that right now. I will update the draft I created if we agree here to give us both reference for later.
When someone will come with this problem that will require to explicitly set the controller we will add new methods to library API. I agree that we don't need to care about that right now. I will update the draft I created if we agree here to give us both reference for later.
Agreed. The API can always be added/extended, I fully support the idea of not adding something until it's needed.
This is not a complete representation, I focused here on ledmon mostly but I also wanted to list all expected library functions. This is why some of them doesn't have references. I omitted slot management functionality implementation in ledctl- I believe that it does not require detailed view. Let me know if something is not clear for you.
Thanks!
@mtkaczyk Please take a look at updated PR and see if this is starting to look closer to something you like.
The next step (after addressing current issues) should be removing ledmon and ledctl lists from library ctx (slaves, volumes ledmon/ledctl block_list).
This may very well be a good change, but from the perspective of the public API, it's an implementation detail. As such, I would like to defer it until we get this PR merged.
As the last step, I would like to propose code dividing into subdirectories:
I'm not an autoconf wizzard, I don't muck with it enough to be super proficient. I'll spend some time on it, but if it takes to long I would like to defer and place it a separate PR after this gets merged.
~Update: I started working on the re-org a bit. What occurred to me is we will break packaging with this change.~ Also list.c
is used by library code and ledmon/ledctl
, so it should be in the lib dir too. I've added a commit to the end which does the re-org, but I'll likely remove it with future changes or simply squash the entire PR to one commit to address any needed changes.
@mtkaczyk I build a local SRPM and ran it through our code analysis tools. It found ~34 defects which I'm attempting to work through. Some look like false positives, but some look like genuine bugs (memory leaks, possibility of de-referencing null, failure to check return codes etc.)
I'll try to get that completed tomorrow and do testing as you suggested on the command line.
@mtkaczyk The directory re-work is in the PR
@mtkaczyk
I need around one month. Led me know if that works for you.
That works, thank you for giving a time estimate.
Example of using library API
#include <stdlib.h>
#include <stdio.h>
#include <led/libled.h>
static inline void print_slot(struct led_slot_list_entry *s)
{
printf("cntrl type: %d: slot: %-15s led state: %-15d device: %-15s\n",
led_slot_cntrl(s), led_slot_id(s), led_slot_state(s), led_slot_device(s));
}
static inline void print_cntrl(struct led_cntrl_list_entry *s)
{
printf("cntrl type: %d: path: %s\n", led_cntrl_type(s), led_cntrl_path(s));
}
int main() {
struct led_slot_list_entry *slot = NULL;
struct led_ctx *ctx = NULL;
struct led_slot_list *slot_list = NULL;
struct led_cntrl_list *cntrl_list = NULL;
struct led_cntrl_list_entry *cntrl = NULL;
led_status_t status;
status = led_new(&ctx);
if (LED_STATUS_SUCCESS != status){
printf("Failed to initialize library\n");
return 1;
}
status = led_scan(ctx);
if (LED_STATUS_SUCCESS != status){
printf("led_scan failed %d\n", status);
return 1;
}
status = led_slots_get(ctx, &slot_list);
if (LED_STATUS_SUCCESS != status) {
printf("led_slots_get failed %d\n", status);
return 1;
}
printf("Printing slots in order returned\n");
while ((slot = led_slot_next(slot_list))) {
print_slot(slot);
}
led_slot_list_reset(slot_list);
printf("Printing slots in reverse order\n");
while ((slot = led_slot_prev(slot_list))) {
print_slot(slot);
}
led_slot_list_free(slot_list);
printf("Printing controllers\n");
status = led_cntrls_get(ctx, &cntrl_list);
if (LED_STATUS_SUCCESS != status) {
printf("led_cntrls_get failed %d\n", status);
return 1;
}
printf("Printing controllers in order returned\n");
while((cntrl = led_cntrl_next(cntrl_list))) {
print_cntrl(cntrl);
}
led_cntrl_list_reset(cntrl_list);
printf("Printing controllers in reverse order\n");
while((cntrl = led_cntrl_prev(cntrl_list))) {
print_cntrl(cntrl);
}
led_cntrl_list_free(cntrl_list);
led_free(ctx);
return 0;
}
Hi @tasleson, I tried to compile install and use a library (head: a093e65ff62d13e66b77b754002b770c5c7b94d7) on my setup but I reached some issues:
[gklab-localhost ~/mtkaczyk_work/gerrit-ledmon]# ledctl --get-slot -c npem -d/dev/nvme9n1
slot: led state: UNKNOWN device: /dev/nvme9n1
[gklab-localhost ~/mtkaczyk_work/gerrit-ledmon]# ledctl --get-slot -c vmd -d/dev/nvme9n1
slot: led state: NORMAL device: /dev/nvme9n1
On setup where I don't have NPEM ctrl for this device:
lrwxrwxrwx 1 root root 0 Feb 20 17:33 nvme9n1 -> ../devices/pci0000:d7/0000:d7:05.5/pci10004:00/10004:00:01.0/10004:02:00.0/nvme/nvme9/nvme9n1
[gklab-localhost ~/mtkaczyk_work/gerrit-ledmon]# ledctl -L | grep ":d7:05.5"
/sys/devices/pci0000:d7/0000:d7:05.5 (VMD)
I know that you are lack of setup, so I will look into that myself but suggestions are always welcome.
I know that you are lack of setup, so I will look into that myself but suggestions are always welcome.
I'm assuming that if you simply run the HEAD of master branch that things work as expected?
I'm assuming that if you simply run the HEAD of master branch that things work as expected?
No, something is wrong too but result is quite different:
[gklab-localhost ~/mtkaczyk_work/ledmon/ledmon_gh_master]# ./src/ledctl --get-slot -c npem -d/dev/nvme9n1
slot: 0000:d7:05.5 led state: UNKNOWN device: /dev/nvme9n1
[gklab-localhost ~/mtkaczyk_work/ledmon/ledmon_gh_master]# ./src/ledctl --get-slot -c vmd -d/dev/nvme9n1
slot: 260 led state: NORMAL device: (empty)
We will fix the current master, then I will proceed with investigating library rfc. It might be connected.
Hi @tasleson The most important target is to release new ledmon version by the end of march. For that reason, I need to postpone looking into this PR for now. I will back here once new release is ready, it is not outside my plans.
Thanks, Mariusz
Hi @tasleson The most important target is to release new ledmon version by the end of march. For that reason, I need to postpone looking into this PR for now. I will back here once new release is ready, it is not outside my plans.
Hi @mtkaczyk
I'm just checking to see if you're on track to release a new ledmon version this month?
Thanks, Tony
Rebased
I'm just checking to see if you're on track to release a new ledmon version this month?
Hi @tasleson , As you probably noticed, we are delayed :( Hopefully, we will release new ledmon next week.
Major rebase
773 additions and 520 deletions.
Summary of all rebase changes made to date on this PR
2235 insertions(+), 1678 deletions(-)
Corrected many of the check patch issues, but did not include the copyright header changes. Also the message of replacing
_attribute__ ((format (printf, 3, 4)));
with
__printf(3,4)
fails to work with gcc. I believe __printf
is a linux kernel macro as I can find it in the kernel source tree as:
#define __printf(a, b) __attribute__((format(printf, a, b)))
~I'm thinking that the check patch was intended for kernel code submissions?~ The checkpatch is for kernel code submissions. Unfortunately, some of the checks are not going to make sense to this code base and cause a number of false positives.
~I'm thinking that the check patch was intended for kernel code submissions?~ The checkpatch is for kernel code submissions. Unfortunately, some of the checks are not going to make sense to this code base and cause a number of false positives.
Yes it is. However certain checks can be turned off if they don't make sense 😃. I made a PR to address the meaningless errors turning up here.
Hi @tasleson ,
As we released 0.97 ledmon I started to familiarize myself with the library and checking the implementation.
Are you able to rebase to HEAD this PR?
I checked too your "Example of using library API", is the only way to build such code is to specify the exact path of the library?
I installed ledmon by make install
and then I had to point to the lib directly gcc -o library ../test.c /usr/lib/libled.so.0.97.0
One more thing I think is that it would be worth separating the installation into two types:
After the rebase I will start checking the lib more deeply and prepare a plan to test the changes on our side.
Hi @tasleson ,
Hi @bkucman
As we released 0.97 ledmon I started to familiarize myself with the library and checking the implementation.
Are you able to rebase to HEAD this PR?
In progress, hope to complete tomorrow.
I checked too your "Example of using library API", is the only way to build such code is to specify the exact path of the library? I installed ledmon by
make install
and then I had to point to the lib directlygcc -o library ../test.c /usr/lib/libled.so.0.97.0
There is a package config file in this PR, so when installed you can compile & build leveraging that. I can include an example.
One more thing I think is that it would be worth separating the installation into two types:
* make install - should only install ledmon and ledctl as it was before, * make install_all - ledmon/ledctl + additionally install library components. We should treat the library as a development part that is not necessary for ledmon/ledct to work. There could be a third command to install only the library, but I don't know if that's necessary.
I've updated the configure to take a --without library
option which will not build the shared library and thus not install it with a make install. For packaging, eg. rpm the make install
should install everything if configured with defaults. The packaging will have multiple packages for binaries, runtime shared lib, and development files (header etc.).
The entire reason I'm doing this work is for the use case where I want just the library, but I'll take care of that with the rpm packaging.
After the rebase I will start checking the lib more deeply and prepare a plan to test the changes on our side.
Thanks
@bkucman
The developer using ledmon library should use something like the following to build their application:
gcc `pkg-config --cflags --libs ledmon` <file_name.c> -o <binary_name>
Hi @tasleson,
currently we have planned a large scope of tests focusing on
VMD, SATA
andNPEM
. As this is a large scope, it may take a while to prepare environment and execute them.I think in the meantime you can also add an initial library usage instruction to the README.
Commit added with this addition.
Rebased
@mtkaczyk @mgrzonka @bkucman Could we please get the development library check
added to CI so that the builds work?
Hi @tasleson
Basically, package named check
has to be installed.
.github/workflows/review.yml
- name: 'Install gcc'
run: sudo apt-get update && sudo apt-get install gcc-${{ matrix.gcc-version }} g++-${{ matrix.gcc-version }}
- name: 'Install dependencies'
- run: sudo apt-get install pkg-config automake autoconf make libsgutils2-dev libudev-dev libpci-dev
+ run: sudo apt-get install pkg-config automake autoconf make libsgutils2-dev libudev-dev libpci-dev check
- name: 'Generate compiling configurations'
run: ./autogen.sh && ./configure
- name: 'Make'
With this change I believe it should work.
Unfortunately, I cannot push onto your library_rfc
branch, as I get 403 error.
Hi @tasleson
Basically, package named
check
has to be installed.
.github/workflows/review.yml
Oh, I grepped around looking for where the dependencies were installed and missed this as I didn't include hidden directories.
Done, thanks!
@mtkaczyk @mgrzonka @bkucman Would it be possible to get make check
run on your test hardware?
@mtkaczyk @mgrzonka @bkucman Would it be possible to get
make check
run on your test hardware?
It fails for me, I will provide more data tommorow:
[gklab-localhost ~/mtkaczyk_work/ledmon/ledmon_lib_pr]# cat test-suite.log
=====================================
ledmon 0.97.0: ./test-suite.log
=====================================
# TOTAL: 1
# PASS: 0
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: tests/runtests.sh
=======================
exercising ledctl
============================= test session starts ==============================
platform linux -- Python 3.9.10, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /root/mtkaczyk_work/ledmon/ledmon_lib_pr, configfile: pytest.ini
----------------------------- live log collection ------------------------------
INFO conftest:conftest.py:11 [CONFIG] ledctl binary: src/ledctl/ledctl
INFO conftest:conftest.py:12 [CONFIG] slot filters:
collected 2 items
tests/slot_test.py::TestSlot::test_non_slot_set_path[src/ledctl/ledctl-slot_filters0] PASSED [ 50%]
tests/slot_test.py::TestSlot::test_slot_state_walk[src/ledctl/ledctl-slot_filters0] PASSED [100%]
============================== 2 passed in 0.43s ===============================
running library unit test
Running suite(s): ledmon
80%: Checks: 5, Failures: 1, Errors: 0
lib_unit_test.c:274:F:lib_unit_test:test_led_by_path:0: Retrieved state 2 != 10 expected
FAIL tests/runtests.sh (exit status: 1)
This moment finally comes. It has enough good quality to be merged. Thanks to all contributors (especially @tasleson for driving this great feature). It is small step for me to click merge but is is a big step for ledmon project!
I totally understand that it is not yet complete, there will be a bugs. There are still some topis that require clarification. There are still many questions. We will drive them all in separate pulls according to our needs and priorities.
I'm not going to make a release soon. There is a time for corrections in the library API. The biggest step is done.
Thanks everyone for support! @bkucman @ktanska @mgrzonka @mku514k @tasleson
Tony, please give me an ack, then I will merge it.
Tony, please give me an ack, then I will merge it.
My official 'ack'
, please do merge it.
@mtkaczyk @bkucman @ktanska @mgrzonka @mku514k
Thanks everyone!
This is an initial pass at providing a library interface. This PR is based on ses_slot_support https://github.com/intel/ledmon/pull/109 , that PR should be merged first. The number of changes is fairly significant and I'm assuming it will take a bit to get this fully reviewed, tested, and integrated. The first few commits (starting with f555300da87443d2922bf92f30fbee75a9ca6aa2) are fairly invasive as it wasn't easy to constrain the changes when introducing a library context and support logging and get things to compile. The early commits disable compiling of
ledmon
in the makefile until the library was more complete, before re-enabling and updating. I tried to break out the commits to finer granularity a couple of times and gave up. As things progressed, I made smaller more contained edits to address specific issues.Ledctl
has good separation now and uses the public library interface.Ledmon
is still tightly coupled with the code base. Eventually it maybe be useful to add the needed functionality to the public library interface to supportledmon
, but I don't think it's required.I've only done limited testing with
ledctl
andledmon
on a system I have with SES. I'll need to check around to see if I happen to have access to other hardware which is supported for further testing. I also wrote a simple program which uses the library to ensure some basic functionality. As we expand testing and examples, I think it would be good to include them in the source repository.I followed the guidelines in libabc when crafting the API and previous discussions we had on the issue ref. https://github.com/intel/ledmon/issues/97.
Resolves: https://github.com/intel/ledmon/issues/97