intel / cri-resource-manager

Kubernetes Container Runtime Interface proxy service with hardware resource aware workload placement policies
Apache License 2.0
176 stars 57 forks source link

e2e: re-try verify command #1067

Closed marquiz closed 1 year ago

marquiz commented 1 year ago

Add a simple and stupid unparameterized 5 second re-try loop to the verify() command. This should mitigate test flakyness on slow (or heavily loaded) systems.

marquiz commented 1 year ago

ping @askervin @klihub

This PR sort of generalizes (and mostly replaces) #1063, as I was seeing similar issues on other tests as well on some machines.

marquiz commented 1 year ago

@marquiz, do you think we could take #1069 instead?

We can take, although I'm not sure know how much value the --retry parameter brings. I'd say we need to have retry 3 times (at least) for all tests.

I'd prefer to include explicit --retry X on verifications where we hope situation to get better soon, and we are fine trying it out many times.

It's very hard to predict where --retry 0´ would be ok and where we need--retry 3`. It just so much depends on the random factors of the execution environment at the time. It's just futile battle if we start adding --retry parameter every time an assertion randomly fails somewhere.

Re-trying shouldn't cause any harm. If situation stabilises fast then the test continues immediately (without retries)

marquiz commented 1 year ago

I take my words back. Let's do #1069 with greater thatn zero default retry count