Why did LSTM aggregator obtain the highest f1 score?

xxxzhi commented 6 years ago

LSTM is usually used to sequence task, What is the reason that LSTM could success to aggretate neighbor features which has no sequence dependence?

williamleif commented 6 years ago

Thanks @xxxzhi! This is a very good question. However, I think the first thing to note is that the LSTM-based model was not (statistically) significantly better than the pooling-based model, so I think a large part of the LSTM's strong performance is just that it has a higher representational capacity.

That said, the LSTM-based model did outperform our expectations. One important note is that LSTMs are perfectly capable of learning permutation-invariant functions (e.g., LSTMs can learn to take sums). I think the main takeaway here is that the LSTMs do not seem to be hurt by their sequential nature in this task. Unfortunately, we do not have any deep insights as to why this is the case, since the learned models are quite uninterpretable, but the real (if somewhat unsatisfying) answer is that LSTMs can in principle learn to approximate any permutation-invariant function, and the training by shuffling seems to be enough to "teach" the LSTM to ignore the sequence ordering.

Hope that is somewhat helpful!

Will

xxxzhi commented 6 years ago

Oh, thanks for you answer. It is helpful!

xxxzhi commented 6 years ago

In fact, after read this paper Stochastic Training of Graph Convolutional Networks, my question have abated.

Training with more epoch result:

snipaste_2018-01-08_22-04-38

Blue line is LSTM-based model.

xxxzhi commented 6 years ago

@williamleif Covariant Compositional Networks For Learning Graphs, The author seem to think a node depend on the internal ordering of all neighbor nodes. But the paper is hard to read.

williamleif / GraphSAGE

Why did LSTM aggregator obtain the highest f1 score? #20