msyhu / paper-logs

읽어야 하는 논문들을 관리하고, 읽은 논문들의 기록을 남기는 공간
7 stars 1 forks source link

KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud #1

Open msyhu opened 2 years ago

msyhu commented 2 years ago

어떤 내용의 논문인가요? 👋

Abstract (요약) 🕵🏻‍♂️

Container has emerged as a new technology in clouds to replace virtual machines (VM) for distributed applications deployment and operation. With the increasing number of new cloud-focused applications, such as deep learning and high performance applications, started to reply on the high computing throughput of GPUs, efficiently supporting GPU in container cloud becomes essential. While GPU virtualization has been extensively studied for VM, limited work has been done for containers. One of the key challenges is the lack of support for GPU sharing between multiple concurrent containers. This limitation leads to low resource utilization when a GPU device cannot be fully utilized by a single application due to the burstiness of GPU workload and the limited memory bandwidth. To overcome this issue, we designed and implemented KubeShare, which extends Kubernetes to enable GPU sharing with fine-grained allocation. KubeShare is the first solution for Kubernetes to make GPU device as a first class resources for scheduling and allocations. Using real deep learning workloads, we demonstrated KubeShare can significantly increase GPU utilization and overall system throughput around 2x with less than 10% performance overhead during container initialization and execution.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

레퍼런스의 URL을 알려주세요! 🔗

https://dl.acm.org/doi/pdf/10.1145/3369583.3392679

오픈소스가 있다면 주소를 써 주세요!

https://github.com/NTHU-LSALAB/KubeShare

msyhu commented 2 years ago

Motivation

Resource Fragmentation

단순 round robin 방식으로 GPU를 할당하면 overcommit 과 undercommit 문제가 같이 발생한다. 따라서 locallity 를 고려하며 GPU를 1급 클래스 자원(리소스 엔티티가 확실하게 구분되고 resource manager 와 user 양 쪽에서 선택될 수 있는 것?) 으로 다루어야 위 문제를 고려할 수 있다. image

Implicit and Late Binding

kubelet 이 GPU를 바인딩하게 할 경우 kubelet 말고는 아무도 pod-to-GPU binding 에 관여하지 못한다. 따라서 GPU를 다루고 바인딩 할 수 있는 새로운 모듈을 개발한다.

Requirements

msyhu commented 2 years ago

Method

image

msyhu commented 2 years ago

Experiment