GaiaGPU: Sharing GPUs in Container Clouds

msyhu commented 2 years ago

어떤 내용의 논문인가요? 👋

Abstract (요약) 🕵🏻‍♂️

Containers are widely used in clouds due to their lightweight and scalability. GPUs have powerful parallel processing capabilities that are adopted to accelerate the execution of applications. In a cloud environment, containers may require one or more GPUs to fulfill the resource requirement of application execution, while on the other hand exclusive GPU resource of a container usually results in underutilized resource. Therefore, how to share GPUs among containers becomes an attractive problem to cloud providers. In this paper, we propose an approach, called GaiaGPU, to sharing GPU memory and computing resources among containers. GaiaGPU partitions physical GPUs into multiple virtual GPUs and assigns the virtual GPUs to containers as request. Elastic resource allocation and dynamic resource allocation are adopted to improve resource utilization. The experimental results show that GaiaGPU only causes 1.015% of overhead by average and it effectively allocates and isolates GPU resources among containers.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

GPU Local Scheduling 에 초점을 맞춘 논문이다.
쿠버네티스 환경에서 GPU Sharing 에 필요한 각 모듈의 역할을 잘 정의해 보았다.
이 주제 치고 조금 시간이 지난 논문인데(2018), 이 논문에서 정의하는 각 모듈의 역할을 후에 나오는 Kubeshare 등에서 비슷하게 계승한다.

레퍼런스의 URL을 알려주세요! 🔗

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8672318

msyhu commented 2 years ago

Motivation

컨테이너 당 하나 혹은 그 이상의 GPU를 통째로 할당하면 심각한 utilization 하락.
확실한 isolation 이 이루어져야 GPU 자원을 공유할 경우 컨테이너 간 GPU 자원에 대한 interference 예방 가능.
transparently 하게(어플리케이션 코드, 컨테이너 이미지 수정 없이) GPU 자원을 공유할 수 있는 방안 필요.
또한 런타임에 유연적(elastic)이고, 동적(dynamic) 으로 GPU 자원을 다룰 수 있어야 적절한 시점에 자원 관리 가능.
위 목표들을 달성하면서 오버헤드가 적어야 함.

msyhu commented 2 years ago

Method

Architecture

GPU Manager : vGPU를 생성하고 kubelet과 통신하면서 요청이 들어올 경우 vGPU의 실제 물리 GPU 할당량을 계산해서 그만큼을 GPU Scheduler 에게 요청한다.
GPU Scheduler : GPU Manager 에서 온 GPU할당량에 맞게 GPU를 할당해준다.
vGPU Manager : 컨테이너에게 GPU 설정 정보를 전달하고, GPU관련 모니터링 결과를 받는다.
vGPU Library : 컨테이너 내부에서 할당받은 GPU 자원을 관리한다. CUDA 관련 API 호출을 Intercept 해서 컨테이너가 자원을 더 쓸 수 있는지 검사한 후, 가용 자원이 남았을 때만 해당 API를 호출한다.

Elastic, Dynamic Resource Allocation

유저가 컨테이너 실행 시점에 job의 정확한 필요 리소스 양을 모르기 때문에, 일단 실행시킨 다음 동적으로 리소스를 필요 양에 맞추는 기법이 필요하다.
Elastic
GPU 자원이 남으면 더 할당해주고, 부족하면 더 할당받은 애한테 뺏어서 필요한 애한테 할당해주는 식으로 상황에 맞춰서(적응적으로?) GPU 자원을 조절해간다.

msyhu commented 2 years ago

Experiment

overhead

거의 없다.

partitioning

대충 vGPU 나눈 갯수에 비례해서(linear) 수행 시간이 결정된다. 즉 잘 나눠진다는 말이다. 다만 메모리 관점에서는 별 변화가 없었다.

Isolation

본 논문에서 정의하는 isolation이란 하나의 container job 의 실행이 다른 container job에 의도하지 않은 영향을 끼치는 것 이다. MNIST로 작업을 고정시켜 놓고 컨테이너 늘렸을때 isolation 잘 되는지 여부를 확인한 도표이다. 혼자 쓸 때는 약 70% 정도 사용했는데 두개일때는 평균 36프로(362 = 72), 4개일때는 평균 17%(174 = 68), 8개일때는 평균 8%(8*8 = 64) 결과가 나왔다. 약간의 편차가 있지만, 대충 효과적으로 isolate시키는 것을 볼 수 있다.

Elastic Resource Allocation

0.3 GPU 요구하는 컨테이너 먼저 실행시키고 0.7 GPU 요구하는 컨테이너 나중에 실행시켰을 때, 먼저 실행된 0.3 GPU 컨테이너가 자원 독점하고 있다가 0.7 컨테이너 나중에 들어오니까 자원 사용량을 줄이는 모습을 볼 수 있다.

msyhu commented 2 years ago

Critic

이기종성에 대한 고려가 없다. 하나의 vGPU가 GPU utilization의 1%를 의미한다고 하는데, 이러면 기종마다 다를 것이다.
Global Scheduling 에 대한 서술이 없다.

msyhu / paper-logs