pCAMP: Performance Comparison of Machine Learning Packages on the Edges

어떤 내용의 논문인가요? 👋

엣지에서 사용되는 여러 최신 머신 러닝 패키지(TensorFlow, Caffe2, MXNet, PyTorch, TensorFlow Lite 등) 의 "추론" 성능을 비교하며 다양한 엣지 디바이스(MacBook Pro, Intel FogNode, NVIDIA Jetson TX2, Raspberry Pi, Nexus 6P)에서 널리 사용되는 두 가지 유형의 신경망(AlexNet, SqeezeNet)을 사용하여 지연 시간, 메모리 사용량, 에너지 등을 평가한 연구

Abstract (요약) 🕵🏻‍♂️

Machine learning has changed the computing paradigm. Products today are built with machine intelligence as a central attribute, and consumers are beginning to expect near-human interaction with the appliances they use. However, much of the deep learning revolution has been limited to the cloud. Recently, several machine learning packages based on edge devices have been announced which aim to offload the computing to the edges. However, little research has been done to evaluate these packages on the edges, making it difficult for end users to select an appropriate pair of software and hardware. In this paper, we make a performance comparison of several state-of-the-art machine learning packages on the edges, including TensorFlow, Caffe2, MXNet, PyTorch, and TensorFlow Lite. We focus on evaluating the latency, memory footprint, and energy of these tools with two popular types of neural networks on different edge devices. This evaluation not only provides a reference to select appropriate combinations of hardware and software packages for end users but also points out possible future directions to optimize packages for developers.

이 논문을 읽어서 무엇을 배울 수 있는지 알려주세요! 🤔

엣지 디바이스 기반 머신 러닝 패키지: TensorFlow, Caffe2, MXNet, PyTorch, TensorFlow Lite 등
- 대부분의 패키지에서 모델을 로드하는 데 걸리는 시간이 모델을 실행하는 데 걸리는 시간보다 길다 (개선의 여지가 존재)
AlexNet(대규모 CNN, 240MB)과 SqueezeNet(소규모 DNN, 4.8MB)
프로파일링 도구
- 안드로이드 Trepn
- 리눅스 RAPL
- 맥 Intel Power Gadget

latency: 일부 패키지의 경우 모델을 로드하는 데 걸리는 시간이 모델을 실행하는 데 걸리는 시간보다 더 길다. => 로딩 지연 시간과 빈도를 줄이면 엣지 성능을 개선할 수 있다.
memory footprint: 텐서플로우의 메모리 사용량이 가장 많으며, pytorch의 메모리 사용량이 가장 적어 성능이 우수하다. MXNet은 시간 효율성을 개선하기 위해 메모리 공간을 희생했다고 볼 수 있다.
에너지와 지연 시간 간에 양의 상관관계가 있다. (지연 시간이 길수록 에너지 소비량이 증가)

결론적으로 여러 하드웨어 플랫폼 중 모든 지표에서 우위를 가지는 단일 패키지는 없으며, 각 플랫폼마다 장단점이 존재한다. 1) CPU 기반 플랫폼인 FogNode에서 대규모 모델을 실행할 때는 TensorFlow가 더 빠르며, 소규모 모델을 실행할 때는 Caffe2가 더 빠르다. MXNet은 Jetson TX2에서 실행할 때 가장 빠른 패키지 2) PyTorch는 다른 패키지보다 메모리 효율이 더 높다 3) MXNet은 FogNode와 Jetson TX2에서 가장 에너지 효율적인 패키지이며, Caffe2는 MacBook에서 더 나은 성능을 발휘

같이 읽어보면 좋을 만한 글이나 이슈가 있을까요?

안드로이드 앱 성능 최적화 #6 CPU 와 CPU 성능 최적화

레퍼런스의 URL을 알려주세요! 🔗

markdown 으로 축약하지 말고, 원본 링크 그대로 그냥 적어주세요!

sypark9646 / paper-logs