ixxi-dante / an2vec

Bringing node2vec and word2vec together for cool stuff
GNU General Public License v3.0
22 stars 6 forks source link

Test various fixes to blogcatalog training #48

Closed wehlutyk closed 6 years ago

wehlutyk commented 6 years ago

Run the training only for 2000 epochs, as most of the final quality seems to be reached there already.

wehlutyk commented 6 years ago

Currently running on grunch with the two first changes activated.

The third change should be implemented by creating a new Gaussian codec that doesn't take u as a parameter, then removing that layer from the model.

wehlutyk commented 6 years ago

The last run showed still the same poor performance. BUT!

It turns out there were two bugs, one in the ordering of the nodes in the adjacency matrix (thank you networkx for ordering nodes by insertion...), and one in the target feature values fed to the model. So both adjacency and features were being trained with bugs. Fixing that, and removing u from the embedding parameters, seems to work beautifully with 200 nodes. A script is now running with those fixes on the full BlogCatalog dataset, and we'll see how things are going tomorrow morning.

wehlutyk commented 6 years ago

43f7d4efbf54dbfe2e21469c5124a9737fa25e9f adds the exploration notebooks with the fixes for this issue.

wehlutyk commented 6 years ago

Commit eb01a5a08cf5fa9826c096e64aa7f2dbfb305793 shows that training works not bad with crops of BlogCatalog, provided you train for long enough:

The main issue seems to be the 'star' nodes which are connected to nearly everybody: the embedding can't represent those other than by dedicating a dimension of variance to them (look at the right half of their embedding, which is the variance), to make them catch other nodes in the scalar product. (This is my interpretation of what's going on.)

Another interesting point is how much training improved on the 1000 crop when increasing the embedding dimension: this is a good illustration of our idea for exploring the cost function of the compression.

The next steps now are:

wehlutyk commented 6 years ago

Results on the full dataset, no adj scaling, are in 1d8467f989e4807e4b7239e9522893e450853f90: it's bad, which was expected for several reasons:

One last test is running with adj scaling and a 25D embedding, just to check for improvements, before closing this issue, resuming #29, and moving on with the next steps.

wehlutyk commented 6 years ago

Results for the latest run with adj scaling and 25D embedding are in ac9894cca19250c382eba78e1a6501c8c465c59b. It's not good, but not in a surprising way. It indicates that for this size network we need to train for much longer. Closing this and moving on with the next steps.