ixxi-dante / an2vec

Bringing node2vec and word2vec together for cool stuff
GNU General Public License v3.0
22 stars 6 forks source link

Encode σ directly in the OrthogonalGaussian codec #53

Closed wehlutyk closed 6 years ago

wehlutyk commented 6 years ago

Then:

wehlutyk commented 6 years ago

Encoding σ directly improves things a little: final loss around 0.467, but we're still higher than gae and a qualitative look at the loss curve shows that it goes down more slowly (esp. at the beginning).

Also reduced losses by 1/n_nodes, which should help Adam (and can make minibatches more comparable, but doesn't solve the proportions problem between adj loss and KL loss which don't scale by the same factors). That doesn't solve it the gap.

The implementations seem to match perfectly (checked). Initial values for loss also match perfectly. So the next step is to look at the actual values of the gradients, first without stochasticity, then with it.

wehlutyk commented 6 years ago

(Note that the final loss we have is only about .05 higher than the gae loss, but I'm puzzled as to where it comes from.)

wehlutyk commented 6 years ago

Holy shit, bloody default values.

Keras's default learning-rate for Adam is .001 (https://github.com/keras-team/keras/blob/2.0.1/keras/optimizers.py#L369), whereas the vanilla gae use .01 (https://github.com/tkipf/gae/blob/master/gae/train.py#L25). This change fixes our gap with the gae: I get .4619 in final training loss vs. .4632 for gae with the exact same parameters.

wehlutyk commented 6 years ago

96f6fd529c651d20a86b4f4dab5259924e21fdf5 and 0af7911ec3b8b38a67aa52c1b261632556a23bdb are the two commits allowing to close this.