spike,research: how to measure where the compute time is going overall when we run a virtual machine

Is your feature request related to a problem? Please describe:

As mentioned in today's sig-ci meeting 1 we noted that we don't have an approach inside CI to measure where the compute time is going overall when we run a virtual machine.

Describe the solution you'd like:

We want a holistic measurement approach, i.e. what does the node take itself, what does k8s consume, what remains on kubevirt etc. This could be done on a periodic base, suggested was a monthly run - so that we could compare how we are progressing performance wise.

Describe alternatives you've considered:

We might be able to look at metrics that OpenShift provides - not sure what is available so that we can fulfill this request.

Additional context:

Original note from the doc 1 :

edy: we might need some more profiling (about k8s?)

looks like sth is taking more resources but we can’t confirm whether this is just a side effect - overall it takes more time to reconcile, then tests are either failing due to timeout

we are not sure why load has increased

proposal: have a profiling on a regular basis - in the overall sense

q: should we ignore intermediate errors that are coming due to slowness - take this to the community meeting

/sig ci /sig compute

FYI @EdDev

kubevirt / project-infra

spike,research: how to measure where the compute time is going overall when we run a virtual machine #3674