Minibatch sensitivity analysis

wehlutyk commented 6 years ago

Work branched out from #19.

wehlutyk commented 6 years ago

This is currently running in my session on grunch, using the projects/behaviour/minibatch-parameters.py script.

wehlutyk commented 6 years ago

https://github.com/ixxi-dante/nw2vec/blob/74a748c8df91e68410cf60004e7793722bb022cb/projects/behaviour/minibatch-parameters.py#L129 is a mistake in what is currently running on grunch: it cancels out the noise on the node labels. Not a huge problem, as we'll learn from still nonetheless, but we should:

[x] re-train afterwards with the noise activated, possibly with #38 fix, and with less training epochs each time
[x] once the training is done (or interrupted), rename the files to ...-features_noise_scale=0-...

wehlutyk commented 6 years ago

Paused to look at results in #43.

wehlutyk commented 6 years ago

Resumed.

wehlutyk commented 6 years ago

Changes and fixes that have happened in the BlogCatalog exploration since this computation started:

u removed in embedding parametrisation (improved performance)
activate/deactivate adj loss scaling (not a big deal finally, but the choice now exists)
the ordering bug from BlogCatalog is not relevant here (the dataset is synthetic, generated using nx.planted_partition_network which sorts the nodes)
https://github.com/ixxi-dante/nw2vec/blob/aab70f37a20b0ef7628ff4c4718d2a30a63f0014/projects/behaviour/minibatch-parameters.py#L137 is a mistake (cf. labels vs. features in target function from #48) that has no incidence on the current run, since the noise on the features is 0 and this variable has not been scaled. This is fixed in 7d5cc4eeb48841815283f20576b76802284e4819.
using the Bernoulli decoder for binary features (which these are)

In short: u removed, Bernoulli decoder used, optionally no adj loss scaling. So we could do better in terms of performance, but the results should still be true-ish.

TODO: once this finishes, re-run one short training with the above fixes to see if it changes much or not.

wehlutyk commented 6 years ago

Paused for explorations in #48

wehlutyk commented 6 years ago

Resumed.

wehlutyk commented 6 years ago

Finished running.

wehlutyk commented 6 years ago

Results notebook in b5fbb1d5d2cd26e9e785d5df8ffbe78146591af3, see projects/behaviour/minibatch-parameters-results.ipynb. Here is the final grid:

Minibatch sensitivity analysis

wehlutyk commented 6 years ago

Launched a second run with the changes mentioned above, with 500 training epochs and a network of size 1000 (20 communities of size 50), instead of 2000 (20 x 100).

wehlutyk commented 6 years ago

Relaunched with dims = (20, 25, 25) to see if we get proper training results. If not, check what worked in #48.

wehlutyk commented 6 years ago

(Still waiting for the above run to finish, 11h left.)

It looks like the current minibatch strategy is not good: I'm not seeing good results on trainings with synthetic graphs, and it also doesn't make that much sense with the "sample the dataset uniformly according to variables of interest" strategy (except by chance).

I feel that:

on one side, the minibatch track has made no contribution/progress for the moment
on the other side, BlogCatalog might be studied on CBP without minibatch (100000 epochs are now only 92 hours).

So:

go back to simple manual exploration of the effect of the current minibatch strategy on a small (200-500 nodes) network
think about a better sampling strategy. First for the case where we know the graph generation parameters. Then for the case where we don't know the graph model.
(running) Train BlogCatalog to more epochs on the CBP, and see if this is enough to manually study that dataset (i.e. without minibatch).
finish the Cora reproduction / Correctness issues (to check if there are any problems) before going into optimisation / Performance

wehlutyk commented 6 years ago

Results for the latest tests (500 training epochs, 1000 = 20 x 50 nodes, fixes from above) are in 39fee0fc55ad4f586ea30961f8cf4166a84a32da, in the projects/behaviour/minibatch-parameters-issue_48=fixed-results.ipynb notebook.

The bottom line is that in 500 minibatch epochs we get abysmal prediction performance, when for a RW length of 100 that means the equivalent of 50000 fullbatch epochs.

See pictures here with dims = (20, 10, 2):

minibatch analysis dims = (20, 10, 2)

and with dims = (20, 25, 25):

minibatch analysis dims = (20, 25, 25)

So I'm going on with the roadmap of the previous comment.

ixxi-dante / an2vec

Minibatch sensitivity analysis #29