Test BlogCatalog training in minibatch

wehlutyk commented 6 years ago

Waiting for a full batch to complete on the CBP before launching this. Will use 200 epochs (equivalent 20000 fullbatch epochs, since each minibatch epoch has 100 times each node).

wehlutyk commented 6 years ago

Running both a fullbatch (on p100alpha) and a minibatch (on p100beta).

wehlutyk commented 6 years ago

Training finished, results for the fullbatch in projects/scale/blogcatalog-issue_48=fixed-cbp-results.ipynb, minibatch in projects/scale/blogcatalog-issue_49-results.ipynb.

The fullbatch results has the same problem as previously: the features reconstruction is not bad, but the adjacency reconstruction is pretty poor (but now that we know that we match gae, this seems to be the difficulty of the BlogCatalog dataset in itself). See this reconstruction for the 1000 first nodes in BlogCatalog:

Fullbatch adjacency reconstruction cropped to the 1000 first nodes

The minibatch results are more surprising; the feature reconstruction also not bad (but that's expected), but some structure seems to be recovered in the adjacency reconstruction, hinting to the fact that it might be possible to get the minibatch to work given a good sampling strategy (and a lot of computing time, since this took 10+ days on the CBP's P100). See this reconstruction for the 1000 first nodes in BlogCatalog:

Minibatch adjacency reconstruction cropped to the 1000 first nodes

ixxi-dante / an2vec

Test BlogCatalog training in minibatch #49