kubevirt / project-infra

Project infrastructure administrative tools
Apache License 2.0
26 stars 138 forks source link

spike,research: how to measure where the compute time is going overall when we run a virtual machine #3674

Open dhiller opened 1 month ago

dhiller commented 1 month ago

Is your feature request related to a problem? Please describe:

As mentioned in today's sig-ci meeting 1 we noted that we don't have an approach inside CI to measure where the compute time is going overall when we run a virtual machine.

Describe the solution you'd like:

We want a holistic measurement approach, i.e. what does the node take itself, what does k8s consume, what remains on kubevirt etc. This could be done on a periodic base, suggested was a monthly run - so that we could compare how we are progressing performance wise.

Describe alternatives you've considered:

We might be able to look at metrics that OpenShift provides - not sure what is available so that we can fulfill this request.

Additional context:

Original note from the doc 1 :

  • edy: we might need some more profiling (about k8s?)
    • looks like sth is taking more resources but we can’t confirm whether this is just a side effect - overall it takes more time to reconcile, then tests are either failing due to timeout
    • we are not sure why load has increased
    • proposal: have a profiling on a regular basis - in the overall sense
    • q: should we ignore intermediate errors that are coming due to slowness - take this to the community meeting

/sig ci /sig compute

FYI @EdDev

brianmcarey commented 1 month ago

Would any performance degradation be captured by the sig-performance lanes?