kruize / autotune

Autonomous Performance Tuning for Kubernetes!
Apache License 2.0
158 stars 54 forks source link

GPU MIG Right sizing recommendations by kruize #1312

Open bharathappali opened 22 hours ago

bharathappali commented 22 hours ago

Describe the feature

Kruize reads CPU & memory usage data from the provided data source and comes up with the CPU and Memory right sizing recommendation. In a similar way it would be good to have the GPU MIG partition sizing recommendation for container which utilise GPU's

Examples or references

Most of the ML workloads need GPU power and advanced GPU's from NVIDIA support MIG (Multi instance GPU's) where a single Physical GPU can be partitioned into multi instances of virtual or logical GPU's which can be configured and shared across multiple containers. Ampere (from A30) and Hopper series GPU's provide this feature.

Suggest a solution

Additional Context

None

bharathappali commented 22 hours ago

This new feature can be implemented in the following steps: