How much does the trained ξ improve on the untrained ξ?

wehlutyk commented 6 years ago

When the features correlate a little to the network structure, even random weights seem to separate the distribution of ξ (i.e. the representation layer in the VAE) into communities. This is, as Thomas Kipf explains in his post on GCNs, because "a GCN model [can be interpreted] as a generalized, differentiable version of the well-known Weisfeiler-Lehman algorithm on graphs". It also makes sense as you basically average features from community neighbours, therefore recovering any local average that is consistent with network structure.

The upshot is that even without training, in some cases the distribution of ξ seems to reflect network structure. So what's the use of training, apart from getting a generative model (i.e. training the layers that come after the ξ layer)?

Sub-questions:

[ ] How does such a random ξ behave with non-random features that do not correlate to network structure (e.g. feature communities have lower, or higher, or displaced granularity compared to network communities): does one structure win over the other, or does it simply mess up the picture? The answer to this probably lies in criticisms of Label-propagation algorithms for community detection (e.g. Raghavan, Albert, & Kumara, 2007, "Near linear time algorithm to detect community structures in large-scale networks" ).
[ ] What is a "good" ξ distribution (i.e. a good representation model), independently of its use in the generative model? One that visually reproduces the feature- or network-structure (or both)? Can we build of measure of how good a given ξ is? Maybe there are answers to this in the pro-con arguments about why t-SNE is a good visualisation method (see e.g. How to Use t-SNE Effectively).

wehlutyk commented 6 years ago

After fixing the batch-shuffling mistake (900fe6fee2528e6c0f1ac51852654565bf7d3f9a), it's clear that in the case of random features the trained ξ is much better than the untrained ξ (before this fix we thought that training with random features did not lead to good reconstruction performance -- now we see that it does). See the before/after training ξ distribution plots in the gcn-ae-explore.ipynb notebook, which (at that commit) uses fully random node features: before training, the distribution does not separate communities; after training, it does.

So this answers one question: training improves ξ, which is necessary in the case of fully random features where the initial ξ distribution does not separate network communities well at all. In the case of features that do correlate to the network structure, the two questions above remain open.

wehlutyk commented 6 years ago

Sub-question 1 (above) will be answered by #6 (worked on in #12), where we train with a set of features that represent different communities than the structural communities.

Sub-question 2 (above) will be answered by #17.

I'm closing this as it will be addressed in these other issues.

ixxi-dante / an2vec

How much does the trained ξ improve on the untrained ξ? #5