Open xxxzhi opened 6 years ago
Thanks @xxxzhi! This is a very good question. However, I think the first thing to note is that the LSTM-based model was not (statistically) significantly better than the pooling-based model, so I think a large part of the LSTM's strong performance is just that it has a higher representational capacity.
That said, the LSTM-based model did outperform our expectations. One important note is that LSTMs are perfectly capable of learning permutation-invariant functions (e.g., LSTMs can learn to take sums). I think the main takeaway here is that the LSTMs do not seem to be hurt by their sequential nature in this task. Unfortunately, we do not have any deep insights as to why this is the case, since the learned models are quite uninterpretable, but the real (if somewhat unsatisfying) answer is that LSTMs can in principle learn to approximate any permutation-invariant function, and the training by shuffling seems to be enough to "teach" the LSTM to ignore the sequence ordering.
Hope that is somewhat helpful!
Oh, thanks for you answer. It is helpful!
In fact, after read this paper Stochastic Training of Graph Convolutional Networks, my question have abated.
Training with more epoch result:
Blue line is LSTM-based model.
@williamleif Covariant Compositional Networks For Learning Graphs, The author seem to think a node depend on the internal ordering of all neighbor nodes. But the paper is hard to read.
LSTM is usually used to sequence task, What is the reason that LSTM could success to aggretate neighbor features which has no sequence dependence?