sypark9646 / paper-logs

2022.10 ~
0 stars 0 forks source link

AI4DL: Mining Behaviors of Deep Learning Workloads for Resource Management #8

Closed sypark9646 closed 1 year ago

sypark9646 commented 1 year ago

어떤 내용의 논문인가요? 👋

딥러닝 모델을 학습하는 컨테이너의 리소스 사용량을 분석하는 방법에 관한 연구

Abstract (요약) 🕵🏻‍♂️

The more we know about the resource usage patterns of workloads, the better we can allocate resources. Here we present a methodology to discover resource usage behaviors for the training workloads of Deep Learning (DL) models. From monitoring, we can observe repeating patterns and similitude of resource usage among containers running the training workloads of different DL models. The repeating patterns observed can be leveraged by the scheduler or the resource autoscaler to reduce resource fragmentation and overall resource utilization in a dedicated DL cluster. Specifically, our approach combines Conditional Restricted Boltzmann Machines (CRBMs) and clustering techniques to discover common sequences of behaviors (phases) of containers running the model training workloads in clusters providing IBM Deep Learning Services. By studying the resource usage pattern at each phase and the typical sequences of phases among different containers, we can discover a reduced set of prototypical executions representing most executions. We use statistical information from each phase to refine resource provisioning by dynamically tuning the amount of resource each container requires at each phase of its execution. Evaluation of our method shows that container resource usage displays typical patterns that can help reduce CPU and Memory consumption by 30% relative to reactive policies, which is close to having \emph{a-priori} knowledge of resource usage while fulfilling resource demand over 95% of the time.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

오토스케일링 동작 원리

같이 읽어보면 좋을 만한 글이나 이슈가 있을까요?

Priority-based parameter propagation for distributed DNN training Workload characterization and prediction in the cloud: A multiple time series approach

레퍼런스의 URL을 알려주세요! 🔗

https://github.com/BSC-IBM/AI4DL

sypark9646 commented 1 year ago

https://sysgongbu.tistory.com/215