sustainable-computing-io / kepler-metal-ci

Testing different CI and Github Action pipelines and publish test results
https://sustainable-computing-io.github.io/kepler-metal-ci/
Apache License 2.0
0 stars 11 forks source link

Regression Detected in Kepler or kube-apiserver CPU Utilization Performance #95

Open github-actions[bot] opened 3 months ago

github-actions[bot] commented 3 months ago

Regression detected from the following reports:

Report: https://sustainable-computing-io.github.io/kepler-metal-ci/kepler-stress-test-metrics.html

Details: Significant Regression Detected

Detailed Analysis and Conclusion: Upon reviewing the test results from the provided report, a significant performance regression is observed in the most recent test entries dated 2024-07-31. The Mean Kepler CPU Utilization % and the Standard Deviation (Std Dev %) have shown a substantial increase compared to the previous days' results.

  1. Comparison of CPU Utilization:

    • On 2024-07-30, the Mean Kepler CPU Utilization was recorded at 0.0597766338%.
    • On 2024-07-31, during the first test at 18:18:00Z, the Mean Kepler CPU Utilization surged to 0.3280331034%. This is an increase of approximately 448% from the previous day.
    • The second test on the same day at 19:50:43Z also showed a high Mean Kepler CPU Utilization of 0.3038928317%.
  2. Comparison of Standard Deviation:

    • On 2024-07-30, the Std Dev % was 0.0362022150%.
    • On 2024-07-31, the first test showed a Std Dev % of 0.2598348881%, which is an increase of approximately 617% from the previous day.
    • The second test on the same day recorded a Std Dev % of 0.2290510851%.
  3. Impact and Urgency:

    • The significant increase in both the mean CPU utilization and the standard deviation indicates a potential issue with system stability or a change in the workload that could be adversely affecting performance.
    • Such a drastic change in metrics suggests that immediate investigation is required to identify the root cause. Possible areas of focus could include recent changes in the test environment, configuration updates, or external factors impacting system performance.
  4. Next Steps:

    • It is crucial to cross-verify these results with system logs, configuration changes, or any deployment updates that occurred around the dates of the observed regression.
    • Further stress tests should be scheduled to confirm these findings and to monitor the system's performance for any further anomalies.

In conclusion, the test results from the last two days clearly indicate a significant performance regression that needs immediate attention to maintain the reliability and efficiency of the Kepler system under test.

rootfs commented 3 months ago

Kepler issue is here https://github.com/sustainable-computing-io/kepler/issues/1660