Closed pentium3 closed 1 year ago
Multi-Instance GPU(MIG) allows us to partition one GPU to several different-sized independent instances. eg: partition one A100 to 7 instances(A100-1/7). But there are still some restrictions when partitioning GPU (see ch1).
Goal: partition GPUs regarding instances of different sizes to do DNN serving. satisfy ops/p99 requirement of different models (service level objectives, SLOs).
Problem Definition: Reconfigurable Machine Scheduling Problem (RMS). see ch3.1
[see ch3.3] This paper focuses on a variant of RMS: serving DNNs on GPUs with MIG.
https://arxiv.org/pdf/2109.11067.pdf