IEEE BigData '21 | Position-based Hash Embeddings For Scaling Graph Neural Networks

GNN训练需要node features，如果node features缺失，一般会使用learnable的node embedding. 问题是对于大图，这些learnable node embeddings特别庞大. 比如如果用one-hot embedding，其node embedding大小为 N x d. N可能会达到billion量级，因此node embedding会有上TB. 因此需要压缩其大小.

最naive的方法是hash trick，就是把node ID映射到[0, B)的范围内，B是bucket size，远小于N. 这样就可以将node embedding table size降低到B x d. 并且可以使用多个hash functions，每个node将不同hash function得到的node embedding拼接起来，得到最终的node embedding.

这个paper进一步改进了这个方法，利用图的homophily - 拓扑上相似的nodes应该有更加相近的node features，所以他们对相似节点做聚类. 具体是用METIS做graph partition，因为graph partition其实本身也是在对节点做聚类.

jasperzhong / read-papers-and-code

IEEE BigData '21 | Position-based Hash Embeddings For Scaling Graph Neural Networks #371