Train BlogCatalog with higher embedding dimension

wehlutyk commented 6 years ago

To see if that's the limiting factor in why the embeddings currently don't look good at all. (See here.)

wehlutyk commented 6 years ago

Currently running in my session on grunch, using https://github.com/ixxi-dante/nw2vec/blob/master/projects/scale/blogcatalog.py with dim_ξ = 10.

wehlutyk commented 6 years ago

Training is done, must look at the results now.

wehlutyk commented 6 years ago

Results are in 1f4c4106daf838033dc4e7dae8c7d0ed1f980d35, see the projects/scale/blogcatalog-dim_ξ=10-results.ipynb notebook.

Highlights (see in the figures below, extracted from the notebook):

There is some separation of nodes in the embedding, which is an improvement over the 2D embeddings. Multidimensional downscaling to 2D might show more there, but it's useless if we have no way to validate.
Adjacency reconstruction is still awful
Training seems to stop being effective after 10000 epochs, so further tests will stop at that limit

Training history

Embedding scatter plots for all couples of dimensions

Adjacency reconstruction

wehlutyk commented 6 years ago

Now:

Currently re-training dimension 2 embeddings so we have the training history (as the training had be killed by a reservation, we only had the checkpoint and not the history) → #36

I don't think looking at higher dimensions will give anything else. Instead, there are (at least) two other points to check:

Train without node features, i.e. with an identity matrix as features. Maybe extend that to training with BlogCatalog node features to which an identity matrix is concatenated, to make sure the model has enough information to separate the nodes based on features. → #37
Train without weighing positive/negative errors in adjacency reconstruction loss, as maybe the poor adjacency reconstruction comes from there → #38

And combinations of those two. If all that fails to explain the bad adjacency reconstruction, then the answers will be found by working on the behaviour project: #30 and #32 mainly.

Closing this issue in favour of all the above.

ixxi-dante / an2vec

Train BlogCatalog with higher embedding dimension #31