we plan to provide recommendations for short, medium and long term usage (which currently translate to 1 day, 7 days and 15 days of usage, this can however change to custom definitions, such as long term = 90 days)
This means that we will have to run loads for at least 15 days continually, so that we can gather metrics for these periods and provide corresponding recommendations
So we have a test bench that can run continually for 15 days and generate various benchmark loads conditions and then gather metrics during the entire period.
We then use those metrics to analyze and provide recommendations
Project Overview
This research project focuses on optimizing GPU infrastructure usage through Kruize, a platform that tracks GPU usage for each container. By integrating with OpenShift Observability (Prometheus) and using cost and performance models, Kruize provides recommendations for GPU limits. These recommendations can be enforced by GPU time slice schedulers like Run:ai to enhance GPU utilization, aiming to lower costs and improve performance.
Goals:
Install Kruize with OpenShift AI to observe and model resource usage.
Provide better resource usage defaults and configuration tuning for improved performance and cost efficiency.
follow up from
Details for this issue
Project Overview
This research project focuses on optimizing GPU infrastructure usage through Kruize, a platform that tracks GPU usage for each container. By integrating with OpenShift Observability (Prometheus) and using cost and performance models, Kruize provides recommendations for GPU limits. These recommendations can be enforced by GPU time slice schedulers like Run:ai to enhance GPU utilization, aiming to lower costs and improve performance.
Goals:
Install Kruize with OpenShift AI to observe and model resource usage. Provide better resource usage defaults and configuration tuning for improved performance and cost efficiency.
Steps:
CC
Dominika Oliver - doliver@redhat.com Rebecca Whitworth - rsimmond@redhat.com Dinakar Guniguntala - dgunigun@redhat.com
@ddoliver @rebeccaSimmonds19 @dinogun @shekhar316 @bharathappali @bhanvimenghani @kusumachalasani
@dystewart @schwesig @Milstein @tssala23