jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

VLDB '21 | Accelerating Large Scale Real-Time GNN Inference using Channel Pruning #321

Closed jasperzhong closed 2 years ago

jasperzhong commented 2 years ago

http://vldb.org/pvldb/vol14/p1597-zhou.pdf

jasperzhong commented 2 years ago

GNN inference分两种:

一种是full inference,就是对所有节点都做一次inference. 很慢,一般是offline,对throughput要求高. 这个例子很多,比如推荐系统,用node embedding做推荐,比如每天根据新的数据算一次node embedding. 这个一般会用GPU.

一种是batch inference,这个一般是online,只有一些target nodes,对latency要求高. 不过这种use case是啥?有这种需求吗? 这个GPU/CPU都可. 感觉CPU就行了,毕竟没啥计算量.

这篇文章大概是做reduce feature/hidden embedding dimension来加速inference.