intel / idxd-config

Accel-config / libaccel-config
Other
59 stars 35 forks source link

refactor *_user_test_runner.sh to run in containers #36

Open mythi opened 1 year ago

mythi commented 1 year ago

I'd like to be able to use these scripts in containers without patching in our end-to-end testing for Kubernetes.

Essentially for containers, the flow is that the node admin has configured the WQs, and the tests are run by passing --device /dev/dsa/wq0.0 etc.

ramesh-thomas commented 1 year ago

@xinzhanz can you please take a look?

xinzhanz commented 1 year ago

@mythi If remove cleanup, start_dsa, enable_wqs, this scripts won't work. The purpose is to provide a script which can show it runs well. I don't think it is good to configure DSA/WQs out of the runner.

mythi commented 1 year ago

@xinzhanz no need to remove them but make them run conditionally. one flow could be the user adds: ./dsa_user_test_runner.sh --device <dev>and if <dev> is set, no provisioning is done. Alternatively, just ./dsa_user_test_runner.sh --skip-config.

xinzhanz commented 1 year ago

I prefer --skip-config to skip enable/disable devices in scripts. So need to guarantee the wqs in /dev/dsa are all enabled devices for user.

mythi commented 1 year ago

Sounds OK. Maybe you can reuse this code from our patch:

-DSA=dsa0
+DEV=`ls /dev/dsa/ | sed -ne 's|wq\([^.]\+\)\(.*\)|dsa\1/wq\1\2|p'`
+DSA=`echo $DEV | cut -f1 -d/`

things to be aware of with containers:

  1. the script cannot assume dsa0 (it could be any number), 2. only mounted devices are "visibile"
xinzhanz commented 1 year ago

v1-0001-accel-config-test-provice-skip-config-to-run-dsa-.patch Could you have a try if it works for you?

mythi commented 1 year ago

@hj-johannes-lee can you help checking this patch as part of your work

hj-johannes-lee commented 1 year ago

@mythi @xinzhanz I successfully ran e2e test for dsa plugin that includes accel-config demo. And the demo is patched with what you provided. So, it works for us.!

xinzhanz commented 1 year ago

Great! I will make a patch for review. Thanks @hj-johannes-lee @mythi .

mythi commented 1 year ago

Great! I will make a patch for review. Thanks @hj-johannes-lee @mythi .

can we have the same for both IAA and DSA?

zhangl6 commented 1 year ago

I think there is a potential issue about this implementation, use my local env. as an example, I enabled 2 work queues, [root@emr-bkc test]# ls /dev/iax/wq1. /dev/iax/wq1.1 /dev/iax/wq1.4 then define variables of DEV and IAA, DEV=`ls /dev/iax/ | sed -ne 's|wq([^.]+)(.)|iax\1/wq\1\2|p' IAA=echo $DEV | cut -f1 -d/` now, the content of DEV is, [root@emr-bkc test]# echo $DEV iax1/wq1.1 iax1/wq1.4 then run iaa test as, ./iaa_test -w 0 -l 1048576 -o 0x44 -f 0x1 -1 0x8000 -t 5000 -v -d "$DEV" in this case, only device iax1/wq1.1 can be run, iax1/wq1.4 is ignored. so here need to loop in all enabled devices and work queues

In iaa_user_test_runner.sh, it's necessary to check whether an op is valid and different op has different command line parameters, so each op has its own commands. There are dozens of command lines leading by iaa_test need to be modified and each of them need to loop in all the enabled devices and work queues.

As a tool for common use, I think "--skip-config" and accepting enabled devices and work queues are not a good behave for the script iaa_user_test_runner.sh. How about you guys opinions?

mythi commented 1 year ago

There are dozens of command lines leading by iaa_test need to be modified and each of them need to loop in all the enabled devices and work queues.

our need is fairly simple: run by using the devices that available (in containers, can be a subset of what enabled on the host) but don't do any WQ (re-)configuration and don't assume any fixed WQ id. the tests scripts have proven to be useful because of their coverage but unfortunately requires patching to be useful in containerized environments. that's the reason for this issue.

zhangl6 commented 1 year ago

There are dozens of command lines leading by iaa_test need to be modified and each of them need to loop in all the enabled devices and work queues.

our need is fairly simple: run by using the devices that available (in containers, can be a subset of what enabled on the host) but don't do any WQ (re-)configuration and don't assume any fixed WQ id. the tests scripts have proven to be useful because of their coverage but unfortunately requires patching to be useful in containerized environments. that's the reason for this issue.

Do you need a script that can enumerate and run on all enabled work queues? I mean, for example, if /dev/iax/wq1.1 and /dev/iax/wq1.4 are enabled, the following 2 commands are all necessary for the operation 0x44 in the script?

  1. iaa_test -w 0 -l 2097152 -o 0x44 -f 0x1 -1 0x8000 -t 5000 -v -d iax1/wq1.1
  2. iaa_test -w 0 -l 2097152 -o 0x44 -f 0x1 -1 0x8000 -t 5000 -v -d iax1/wq1.4
mythi commented 1 year ago

Do you need a script that can enumerate and run on all enabled work queues? I mean, for example,

It's not important but as far as I can think of, this would be straightforward with a loop for all devices detected.

ramesh-thomas commented 11 months ago

What is the status of this issue? Is there any patch that needs to be merged?

zhangl6 commented 11 months ago

v1-0001-accel-config-test-Add-parameter-for-specifying-de.patch Hi @mythi , The patch is for selecting dev and wq in iaa_user_test_runner.sh, please help check it. Thanks.

mythi commented 11 months ago

v1-0001-accel-config-test-Add-parameter-for-specifying-de.patch Hi @mythi , The patch is for selecting dev and wq in iaa_user_test_runner.sh, please help check it. Thanks.

@hj-johannes-lee would you be able to help?

hj-johannes-lee commented 11 months ago

@mythi Let me do today or at least by tomorrow.! :)

hj-johannes-lee commented 11 months ago

Sorry for late message. I was struggling with finding a system that has iaa and then with kernels. I tested with 6.1.57 (stable-latest) kernel version and works fine with the plugin and accel-config app container after patching with the patch you provided.! Thanks for the work.!

ramesh-thomas commented 10 months ago

Fix added in https://github.com/intel/idxd-config/releases/tag/accel-config-v4.1.4

mythi commented 9 months ago

We did not test with all platforms. This is problematic still: https://github.com/intel/idxd-config/blob/6c3261f1ce2c239a42806bc2c78d9c7fb1b2065c/test/iaa_user_test_runner.sh#L502-L514

ramesh-thomas commented 9 months ago

@zhangl6 can you please take a look?