pentium3 / sys_reading

system paper reading notes
236 stars 12 forks source link

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem #143

Closed pentium3 closed 1 year ago

pentium3 commented 2 years ago

https://arxiv.org/pdf/2109.11067.pdf

pentium3 commented 2 years ago

Multi-Instance GPU(MIG) allows us to partition one GPU to several different-sized independent instances. eg: partition one A100 to 7 instances(A100-1/7). But there are still some restrictions when partitioning GPU (see ch1).

Goal: partition GPUs regarding instances of different sizes to do DNN serving. satisfy ops/p99 requirement of different models (service level objectives, SLOs).

Problem Definition: Reconfigurable Machine Scheduling Problem (RMS). see ch3.1

[see ch3.3] This paper focuses on a variant of RMS: serving DNNs on GPUs with MIG.