jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

RecSys '22 Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference #375

Closed jasperzhong closed 10 months ago

jasperzhong commented 10 months ago


jasperzhong commented 10 months ago



training用static cache. 其实就是一个split. inference用dynamic cache (LRU).

training因为embedding一直要动态变化,想实现cache coherence太复杂了,CPU上有一份,GPU上有一份,update一直要write through to CPU,导致很多traffic,最后很可能压根没有减少traffic.

而static cache,不存在multiple copies,GPU的embedding一直在GPU上,更新也在GPU上. 如果是multi-GPU,我想也应该只有一份copy,gpu embedding table应该share. 不过hugectr做的是single-gpu cache.