intel / idxd-config

Accel-config / libaccel-config
Other
56 stars 35 forks source link

accel-config test fails due to invisible sysfs knob #48

Closed snits closed 7 months ago

snits commented 8 months ago

With the kernel change 62b41b656666 ("dmaengine: idxd: Expose ATS disable knob only when WQ ATS is supported") 'accel-config test' fails. I need to grab an SPR system again today, and get the details, but wanted to log this before I forget it again. I didn't get a chance to really look at the code yet, but I don't know if there needs to be more logic in check_wq to handle that possibility, or if it was something else.

snits commented 8 months ago

I thought I updated this yesterday, but apparentlly didn't actually save the comment. It is this line in config_wq() that the test trips on:

    SET_ERR(rc, accfg_wq_set_ats_disable(wq, wq_param->ats_disable)); 
ramesh-thomas commented 8 months ago

I will take a look. Thanks.

snits commented 8 months ago

It looks like check_wq() doesn't handle it as well. I added some debugging code to config_wq to get more information, and return rc 0 if it saw -ENOENT for ats_disable, while looking at it. I don't know if the check there should just be to look for both EOPNOTSUPP and ENOENT, or that might get returned by an earlier failure? Also when/if prs_disable support is added to test/ it is another of the knobs that can be invisible.

# accel-config test
run test_libaccfg

Running accfg-test0: set and get configurations for shared wqs
configuring device dsa0
configuring group group0.0
configuring group group0.1
configuring wq wq0.0
libaccfg: accfg_wq_set_ats_disable: wq0.0: ats_disable attribute write failed: No such file or directory
config_wq: ats_disable knob not visible
configuring wq wq0.2
libaccfg: accfg_wq_set_ats_disable: wq0.2: ats_disable attribute write failed: No such file or directory
config_wq: ats_disable knob not visible
configuring engine engine0.0
configuring engine engine0.1
configuring engine engine0.2
configuring engine engine0.3
check device dsa0
check group group0.0
check group group0.1
check wq wq0.0
check_wq failed on ats_disable
check wq failed
accfg-test0 *failed*: -22
test-libaccfg: FAIL
libaccfg: accfg_unref: context 0x5602142012a0 released

# ls /sys/bus/dsa/devices/wq*/ats_disable
ls: cannot access '/sys/bus/dsa/devices/wq*/ats_disable': No such file or directory

# cat /sys/bus/dsa/devices/dsa0/gen_cap 
0x40915f0107

# echo $(($(cat /sys/bus/dsa/devices/dsa0/gen_cap) & (1 << 50)))
0
snits commented 8 months ago

Oops that was the gen_cap. I guess wqcap isn't exported in sysfs. One sec, booting with dyndbg:

[  954.679742] idxd:idxd_read_caps: idxd 0000:f6:01.0: wq_cap: 0x3b000000080080

# echo $((0x3b000000080080 & (1 << 50)))
0
ramesh-thomas commented 8 months ago

I am planning to check for the presence of the attribute before attempting to write to it. @fyu1 is it possible for ats_disable sysfs attribute to be present even if it is disabled in the capability register? i.e. can it be present and readonly?

@snits if you can check if the /sys/bus/dsa/devices/wq0.0/ats_disable attribute is present in that system, it would be helpful.

Thanks

fyu1 commented 8 months ago

The ats_disable file is visible only when it's supported in cap register. This follows generic idxd API file policies that they are visible only when supported.

I think the tool needs to check if ats_disable is existing before accessing it.

snits commented 8 months ago

This kernel commit makes it invisible if it isn't supported.

62b41b656666 dmaengine: idxd: Expose ATS disable knob only when WQ ATS is supported | 2023-08-01 | (Fenghua Yu)

The same thing is done for prs as well. I can boot up a build that doesn't have the latest changes, and the ats_disable file will be there. I ran into this going to test 4.1.1 with the latest driver changes. dsa_user_test_runner.sh will run fine, but accel-config test runs into this.

ramesh-thomas commented 8 months ago

Ok, thanks. I will try to add a common check somewhere to take care of this condition.

snits commented 7 months ago

As a quick work-around I did the following, but I don't expect that this is the answer.

diff --git a/test/libaccfg.c b/test/libaccfg.c
index bfe277b996cf..4a580b2bb02e 100644
--- a/test/libaccfg.c
+++ b/test/libaccfg.c
@@ -240,7 +240,9 @@ static int config_wq(struct accfg_ctx *ctx, struct accfg_device *device,
    if (wq_param->threshold)
        SET_ERR(rc, accfg_wq_set_threshold(wq, wq_param->threshold));

-   SET_ERR(rc, accfg_wq_set_ats_disable(wq, wq_param->ats_disable));
+   if (accfg_wq_get_ats_disable(wq) >= 0) {
+       SET_ERR(rc, accfg_wq_set_ats_disable(wq, wq_param->ats_disable));
+   }
    /* Don't fail test if per wq ats disable is not supported */
    if (rc == -EOPNOTSUPP)
        rc = 0;
@@ -251,6 +253,7 @@ static int config_wq(struct accfg_ctx *ctx, struct accfg_device *device,
 static int check_wq(struct accfg_ctx *ctx, struct accfg_device *device,
        struct accfg_wq *wq, struct wq_parameters *wq_param)
 {
+   int ats_disable = 0;

    if (wq_param->wq_size != accfg_wq_get_size(wq)) {
        fprintf(stderr, "%s failed on wq_size\n", __func__);
@@ -290,7 +293,8 @@ static int check_wq(struct accfg_ctx *ctx, struct accfg_device *device,
        fprintf(stderr, "%s failed on wq name\n", __func__);
        return -EINVAL;
    }
-   if (wq_param->ats_disable != accfg_wq_get_ats_disable(wq)) {
+   ats_disable = accfg_wq_get_ats_disable(wq);
+   if (ats_disable >= 0 && (wq_param->ats_disable != ats_disable)) {
        fprintf(stderr, "%s failed on ats_disable\n", __func__);
        return -EINVAL;
    }

Is accel-config able to get the contents of wqcap from the device?

ramesh-thomas commented 7 months ago

I have added your partial change with other changes. Can you try the latest in pending branch?

ramesh-thomas commented 7 months ago

Fix added in https://github.com/intel/idxd-config/releases/tag/accel-config-v4.1.4