ICLR '21 | Temporal Graph Networks for Deep Learning on Dynamic Graphs

jasperzhong commented 2 years ago

https://arxiv.org/pdf/2006.10637v2

代码：https://github.com/twitter-research/tgn

jasperzhong commented 2 years ago

看上去就是 #278 多了一个memory module. 就是每个node多了一个memory vector用来remember收过的messages. 这样来保证一个long term memory.

首先有两种event:

node-wise event: v_i(t)表示node i和其相关的vector attribute（应该就是node feature）. 如果node i已经存在，那么就是更新feature; 如果不存在，就是增加一个新节点.
interaction-event: 就是边的事件，用e_{ij}(t)表示，也是个vector.

这个memory module包括

memory: 每个node i 有一个vector s_i(t). 当node涉及到一个event的时候，这个s_i(t)就会被更新.
message function: 如何根据event生成message.
- node-wise event: m_i(t) = node i 之前memory || 时间戳t || v_i(t)
- event-wise event: mi(t) = node i之前memory || node j之前memory || 时间戳t || e{ij}(t)
message aggregator: 不需要一个一个message考虑，可以batch多个message一起进行memory更新. 所以需要一个message aggregator. 简单做法就是取平均，或者取最新的一个. 这个是针对每个node的.
memory updater: 如何update memory. 把m_i(t)作为输入，然后node i之前memory作为hidden state，送到LSTM，更新hidden state，即得到新的memory.

剩下部分和#278 感觉差不多... 也是根据neighbour nodes算embedding，不同的是这里会考虑memory，就是把node feature + memory即可. 其他部分一样. 也有functional time encoding like #278

实验

赢了！

jasperzhong commented 2 years ago

实际训练的时候，每个iteration的输入是一个batch of interactions. 上面的流程有一个问题，memory module里面的参数要如何更新呢？他们在进行current batch of interactions计算前，先用previous batch of interactions（存放在raw message store）更新memory，然后用updated memory参与current batch of interactions node embeddings计算. 这样，梯度就可以反向传播到memory module，从而更新其参数. 之所以使用previous batch of interactions，是因为避免在做预测（比如link prediction）的时候实现看到要预测的edges.

值得注意的是，在做current batch of interactions计算的时候，其所使用的memory全都是基于previous batch of interactions计算的，也就是说，batch内部的后面的interactions没有利用前面的interactions，它们都是基于previous batch of interactions. 这是一个trade-off between speed and update granularity. In practice, they use a batch size of 200.

jasperzhong commented 2 years ago

Input: a batch of events (src, dst, timestamp, edge_features, label)
Graph representation: adjacency list (for each node, its involving events are sorted by timestamps)
Neighbor sampling: uniform or most recent

jasperzhong / read-papers-and-code

ICLR '21 | Temporal Graph Networks for Deep Learning on Dynamic Graphs #280

实验