Closed wehlutyk closed 6 years ago
Encoding σ directly improves things a little: final loss around 0.467, but we're still higher than gae
and a qualitative look at the loss curve shows that it goes down more slowly (esp. at the beginning).
Also reduced losses by 1/n_nodes, which should help Adam (and can make minibatches more comparable, but doesn't solve the proportions problem between adj loss and KL loss which don't scale by the same factors). That doesn't solve it the gap.
The implementations seem to match perfectly (checked). Initial values for loss also match perfectly. So the next step is to look at the actual values of the gradients, first without stochasticity, then with it.
(Note that the final loss we have is only about .05 higher than the gae
loss, but I'm puzzled as to where it comes from.)
Holy shit, bloody default values.
Keras's default learning-rate for Adam is .001 (https://github.com/keras-team/keras/blob/2.0.1/keras/optimizers.py#L369), whereas the vanilla gae
use .01 (https://github.com/tkipf/gae/blob/master/gae/train.py#L25). This change fixes our gap with the gae
: I get .4619 in final training loss vs. .4632 for gae
with the exact same parameters.
96f6fd529c651d20a86b4f4dab5259924e21fdf5 and 0af7911ec3b8b38a67aa52c1b261632556a23bdb are the two commits allowing to close this.
Then:
gae
(cf #44, we left at .4618 forgae
vs. .4701 fornw2vec
)gae
and what we have (esp. weighing) match