Closed wsad1 closed 2 years ago
This is an interesting paper.
Two remarks:
- SAGPool improves upon TopKPool (see Self-Attention Graph Pooling) and so it is curious to see that in the table above TopKPool beats SAGPool by a large margin.
Thanks for bringing this up. According to the SAGpool paper. SAG pool is better than Diffpool and topkpool(graph unet). However the GMN paper shows the opposite. One thing to note is that the SAGpool results are same in both the GMN paper and the SAGpool paper. Not sure whats causing this difference. I'll try to dig deeper into their training settings, or code(if available) to find some answers.
- GMN, the ones with the best performance, embeds the topology into the feature vector using random walk with restart. So, it seems that it only works in the transductive setting.
For an unseen graph(inductive setting) S(difussion matrix from random walk) would have to be computed again but the learnt parameters W0 and W1 could be used as is. So it could work in the inductive setting too . Attaching the GMN equation(first layer) for reference.
To the problem of inconsistent results: I am not sure if it is worth digging into this issue: figures reported by papers are (too) often inconsistent with each others, for various reasons, e.g. different implementations, different ways to conduct experiments (notably the data split). What I wanted to say with this remark: If the figures for TopKPool are correct, then the expected performance of SAGPool is likely to be much higher than what is reported in the table, and so the performance of the GMM model is good but no longer that impressive. That said, I find their model interesting (and new).
To the second problem: If I understand the model correctly, the first layer considers the features using the feature matrix X and the topology using the random walk transition matrix S. The order of the elements of S (rows, but also the columns) depends on the labeling of the nodes. If I permute the labels, then S will change. In such a situation, how can W0 be reliably estimated?
Thanks @wsad1 for the proposal and @ldv1 for this discussion. As far as I can tell, adding this method to PyG should be quite straightforward. However, I do not have plans yet to integrate it by myself, but please feel free to contribute :)
@ldv1. To the second problem: If i understand correctly, in section 3.2 and 3.2.1 the authors suggest that S be sorted, making the GMN permutation invariant.
Thanks @rusty1s, i think it can be implemented with some modifications to diffpool. The gmn module will look similar to diffpool in terms of its inputs and outputs. I'll give it a shot.
🚀 Feature
This papers proposes a new graph convolution/coarsening algorithm. The convolution layer doesn't use any kind of gnn, i.e it doesn't explicitly use the graph structure, instead the graph structure is 'embedded' into the feature vector.
Motivation
The results on some graph classification datasets look significantly better than other graph convolution methods. Refer to the last 2 rows in the below image.
In addition to this it might be useful to check performance on the ogb-dataset too.