ixxi-dante / an2vec

Bringing node2vec and word2vec together for cool stuff
GNU General Public License v3.0
22 stars 6 forks source link

How much does the trained ξ improve on the untrained ξ? #5

Closed wehlutyk closed 6 years ago

wehlutyk commented 6 years ago

When the features correlate a little to the network structure, even random weights seem to separate the distribution of ξ (i.e. the representation layer in the VAE) into communities. This is, as Thomas Kipf explains in his post on GCNs, because "a GCN model [can be interpreted] as a generalized, differentiable version of the well-known Weisfeiler-Lehman algorithm on graphs". It also makes sense as you basically average features from community neighbours, therefore recovering any local average that is consistent with network structure.

The upshot is that even without training, in some cases the distribution of ξ seems to reflect network structure. So what's the use of training, apart from getting a generative model (i.e. training the layers that come after the ξ layer)?

Sub-questions:

wehlutyk commented 6 years ago

After fixing the batch-shuffling mistake (900fe6fee2528e6c0f1ac51852654565bf7d3f9a), it's clear that in the case of random features the trained ξ is much better than the untrained ξ (before this fix we thought that training with random features did not lead to good reconstruction performance -- now we see that it does). See the before/after training ξ distribution plots in the gcn-ae-explore.ipynb notebook, which (at that commit) uses fully random node features: before training, the distribution does not separate communities; after training, it does.

So this answers one question: training improves ξ, which is necessary in the case of fully random features where the initial ξ distribution does not separate network communities well at all. In the case of features that do correlate to the network structure, the two questions above remain open.

wehlutyk commented 6 years ago

Sub-question 1 (above) will be answered by #6 (worked on in #12), where we train with a set of features that represent different communities than the structural communities.

Sub-question 2 (above) will be answered by #17.

I'm closing this as it will be addressed in these other issues.