ixxi-dante / an2vec

Bringing node2vec and word2vec together for cool stuff

GNU General Public License v3.0

22 stars 6 forks source link

Full batch sensitivity analysis (adjacency reconstruction, feature reconstruction, and both) #6

Open wehlutyk opened 6 years ago

wehlutyk commented 6 years ago

Test the following parameters:

node features: exact network communities, noisy network communities, noisy network communities with redundancy, random
dim_ξ: [2, ..., 10]
dim_l1: [dim_ξ, ..., 16]
n_ξ_samples: [1, 5, 10]
bias in all layers: True/False

For each run, save:

the original and final-predicted adjacency and/or features
the evolution of the training losses
ξ distribution plots before and after training (for dim_ξ ≥ 2, use a dimension reduction technique such as t-SNE)

jaklevab commented 6 years ago

Finished parameter exploration for adjacency reconstruction without feature reconstruction: See gcn-ae-explore.ipynb in branch issue-6-sensitivity-analysis

wehlutyk commented 6 years ago

18 adds the following parameters of interest: mini-batch size, length of random walks in the mini-batch, and both those quantities w.r.t. the network size and average community sizes. (I might take over for that if you don't to explore them @jaklevab, as I keep adding parameters to this :p )

jaklevab commented 6 years ago

Ok so the minibatch sensitivity analysis gives the same result as the full batch when taking the full batch as minibatch size. Remains to be seen how the reconstruction loss behaves with the different parameters affecting the RW and the minibatch. @wehlutyk are you done with the minibatch parametrization?

wehlutyk commented 6 years ago

Ok so the minibatch sensitivity analysis gives the same result as the full batch when taking the full batch as minibatch size.

Great!

Remains to be seen how the reconstruction loss behaves with the different parameters affecting the RW and the minibatch. @wehlutyk are you done with the minibatch parametrization?

Well, I started, then realised that what I wanted to test would have taken 6 months to run, and decided I wasn't sure which were the right parameter ranges to choose. So I moved on to real data sets (#8) in order to know when it would be necessary (memory-wise) to use a mini-batch size that's not the full batch, which then led me to #21 because it's currently so slow on a ~10,000 nodes network.

So once I'm done with #21 (today or next week), doing #8 should show us the relevant parameter ranges we need to test for the mini-batch and RW (and should give more material for NetSci).

wehlutyk commented 6 years ago

(I started the minibatch parametrisation in #19)

wehlutyk commented 6 years ago

Ok, so we have a bit of a mess with the validation of minibatching and the sensitivity analyses. I'm reorganising:

this issue is for full-batch sensitivity analysis, and progress is tracked in @jaklevab 's PR #12 (renamed both this issue and the PR accordingly)
minibatch validation (i.e., checking it works as expected when we don't have convolutions) and sensitivity analysis (i.e. seeing how it behaves when we change the RW and seed sizes) were mixed together in #19. I'm closing that in favour of #28 and #29.