intel / intel-device-plugins-for-kubernetes

Collection of Intel device plugins for Kubernetes
Apache License 2.0
48 stars 205 forks source link

qat,e2e: add heartbeat and auto-reset validation #1839

Closed hj-johannes-lee closed 1 month ago

hj-johannes-lee commented 2 months ago

This is draft for suggesting an idea and getting opinions.!

hj-johannes-lee commented 2 months ago

@tkatila @mythi pleaes give me your opinions on the first draft when you have time.!

mythi commented 2 months ago

@tkatila @mythi pleaes give me your opinions on the first draft when you have time.!

it's great to have a test case for this available! the implementation cannot rely on the Go code running on the same system that runs the plugin so some additional work is still needed. do you think it's useful to run the test as part of JustBeforeEach?

tkatila commented 2 months ago

I'd move the test code to a "Context" that specifically tests the autoreset functionality. With the SKIP/FOCUS feature we can then enable it when the e2e nodes are upgraded to a kernel with autoreset. At least the SPR node doesn't seem to support it now. edit: It to Context.

hj-johannes-lee commented 2 months ago

do you think it's useful to run the test as part of JustBeforeEach?

I at first thought to put in Context.It, but had no idea for putting a tag for that since others are all like [App: something].

Would it make sense to replace [App:noapp]?

tkatila commented 2 months ago

Would it make sense to replace [App:noapp]?

I'd add a new Context for this specific test and run one App inside of it (e.g. openssl). You could add a new tag to it, like "[Functionality:auto_reset]". It doesn't necessary fit with the other Contexts or Its but I'd be fine with it.

hj-johannes-lee commented 2 months ago

@tkatila I see, but i would try in already existing contexts with new It()s.. New context anyway needs the same process of creating configmap BeforeEach in existing contexts. so,, it somewhat does not make sense to me that we create a new context. But, [Functionality: auto_reset] is a good idea.! Thanks!

run one App inside of it (e.g. openssl)

Umm, is this functionality affect running an app? I am confused.

tkatila commented 2 months ago

run one App inside of it (e.g. openssl)

Umm, is this functionality affect running an app? I am confused.

Nevermind. My though was that instead of just testing for the error recovery, running a workload after the recovery would make sure that the device is actually recovered. Not just appearing to be.

hj-johannes-lee commented 2 months ago

@tkatila Aha, that actually makes sense. Let me add it also.

hj-johannes-lee commented 1 month ago

@mythi @tkatila May I ask if there is any reason why this is not getting merged...?

mythi commented 1 month ago

May I ask if there is any reason why this is not getting merged...?

it was only moved away from draft 15 hours ago and at least I haven't had the change to review it properly

hj-johannes-lee commented 1 month ago

@mythi @tkatila Thank you for your reviews.! I guess there have been quite many changes and improved a lot.! I think it's at least much clearer and cleaner now. If there are still things that should be improved, please comment!