jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
43 stars 3 forks source link

WWW '20 | Heterogeneous Graph Transformer #336

Closed jasperzhong closed 1 year ago

jasperzhong commented 1 year ago

https://arxiv.org/pdf/2003.01332.pdf

jasperzhong commented 1 year ago

挺难看懂的一篇paper. 看这个图容易理解. 很像GAT,但GAT对于source node和target node都是用了同样的weight matrix算attention,这对于hetero graph不make sense. HGT的思路是对于每个node type, edge type都有一个weight matrix.

例如这个图,有一个target node,有两个source node,分别是不同的node type. 不同node type都会经过不同的weight matrix进行转化,再通过对应的edge type进行转化. 所以最后每个source node都能算出来一个attention和message,后面aggregate就很自然了. image

所以可以看出来,给定一个target node,其neighbors可以是不同node type,有不同edge type. HGT的sampling算法我是真没看明白,大致意思是想让每个type的node/edge差不多.

DGL这个实现似乎是没有经过这个采样的. https://github.com/dmlc/dgl/blob/master/examples/pytorch/hgt/model.py

OpenHGNN实现了这个sampler,说实话,挺麻烦的. https://github.com/BUPT-GAMMA/OpenHGNN/blob/main/openhgnn/sampler/HGT_sampler.py 里面比较接近采样的是这一行.

sampled_idx = th.multinomial(prob[src_type], self.num_nodes_per_type, replacement=False)
jasperzhong commented 1 year ago

差点忘了他们是开源的: https://github.com/UCLA-DM/pyHGT/blob/master/pyHGT/data.py#L87

sampling真麻烦啊....

jasperzhong commented 1 year ago

md之前看的版本居然没有temporal部分,arXiv上的有.

终于是CTDG了. 补充的部分为node embedding加了一个temporal encoding. image

但可惜的是,采样部分似乎没有考虑任何temporal dependency,只是有一个为缺失timestamp的节点assign timestamp的操作. 他这里很奇怪,似乎是node带timestamp,而不是edge. 比如paper带timestamp, author, venue都不带timestamp,确实也没法带.

最后实验是带PTE的HGT取得了最好效果. image

jasperzhong commented 11 months ago

非常重要的paper. 一点也不难看懂. key idea就是decompose relation into node/edge type.

比如如果有4个node type (e.g., paper, author, institution, topic),4个edge type (write, cite, affiliated, has a topic of ). 那么一个relation ($(\tau(s), \phi(e), \tau(t))$ 就有4 x 4 x 4 = 64种组合,RGCN需要64个weight matrics...而HGT只需要4个node type weigt和4个edge type matrix (准确来说有更多),利用不同node/edge组合自动形成64种weight matrics pair.

image

对于每一个attention head,对每个node type有两套K-Linear, Q-Linear,对每个edge type有一套W ATT. image

这是attention. 对于message,也是对每个node type有一套M-linear,对每个edge type有一套W MSG.

image

image image

最后就是把neighbor生成的message加起来,乘上个线性层再加上自己.