jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

RecSys '22 Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference #375

Closed jasperzhong closed 10 months ago

jasperzhong commented 10 months ago

https://arxiv.org/pdf/2210.08803.pdf

jasperzhong commented 10 months ago

https://www.nvidia.cn/on-demand/session/gtccn2020-cns20626/

https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/gpu_cache/ReadMe.md

training用static cache. 其实就是一个split. inference用dynamic cache (LRU).

training因为embedding一直要动态变化,想实现cache coherence太复杂了,CPU上有一份,GPU上有一份,update一直要write through to CPU,导致很多traffic,最后很可能压根没有减少traffic.

而static cache,不存在multiple copies,GPU的embedding一直在GPU上,更新也在GPU上. 如果是multi-GPU,我想也应该只有一份copy,gpu embedding table应该share. 不过hugectr做的是single-gpu cache.