jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

OSDI '23 | AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models #374

Closed jasperzhong closed 10 months ago

jasperzhong commented 10 months ago

https://www.usenix.org/system/files/osdi23-lai.pdf

jasperzhong commented 10 months ago

AdaEmbed这篇文章解决的是,给定node embedding table size,根据gradient大小和access frequency来选择重要的node embedding留下,以保持模型accuracy不下降. EI(i) = freqt(i)× ||∇gt(i)|| 可以弄一个moving average.

这样同一个type的feature可以比较EI,但不同type的feature的EI很难直接比较. 所以不是比较绝对值,而是比较相对值,用EI(i)/EI95th(feature(i)).

但node embedding table很大,不可能全部都算一遍importance score. 而且pruning也会涉及大量的cleanup. 所以有一个coordinator定期profile embedding importance score,看分布有没有大的变化,如果. 看有多少embedding越过了pruning boundary,即多少embedding rows其importance ranking跌过或超过一个给定的quantile.

最有有一个memory mananger负责执行pruning.

写的真好. 算是算法和系统codesign的paper. 因为和meta的合作,最后有线上的实验验证他们的accuracy没问题. 这篇文章适合借鉴部分idea但不适合模仿.