Replace `tensordot` call by `while_loop` in `Bilinear` layer, to only compute what we need

ixxi-dante / an2vec

Bringing node2vec and word2vec together for cool stuff

GNU General Public License v3.0

22 stars 6 forks source link

Replace `tensordot` call by `while_loop` in `Bilinear` layer, to only compute what we need #23

Closed wehlutyk closed 6 years ago

wehlutyk commented 6 years ago

Implementation done in 48b378fcc552b98b8c943238ced352986a7e2484 and 3591b279b7fac8f93cdab563a416412bfc60f4f4.

In small tests there doesn't seem to be any speedup, but the memory consumed should be less: we just need to check that we can now run BlogCatalog on grunch with n_ξ_samples = 5 without blowing up the memory.

(tbd when I recover my home folder on grunch...)

wehlutyk commented 6 years ago

The previous tensordot implementation works with with n_ξ_samples = 1 and OOM's with n_ξ_samples = 2. It turns out the same happens for the new whileloop implementation, which might not be that much better then. Closing this as done still.

wehlutyk commented 6 years ago

Correction about the time performance. On my laptop (no GPU) with a super simple 50 nodes network we get

whileloop → 113.64 it/s tensordot → 90.91 it/s

wehlutyk commented 6 years ago

Second correction. Still on my laptop, with a 20 x 20 planted partition network:

with n_ξ_samples = 1
- with k = l = 20:
- whileloop → 28.57 it/s
- tensordot → 28.57 it/s
with n_ξ_samples = 5
- with k = l = 20:
- whileloop → 13.3 it/s
- tensordot → 4.35 it/s

On grunch:

with n_ξ_samples = 1
- with k = l = 20:
- whileloop → 50 it/s
- tensordot → 48.78 it/s
with n_ξ_samples = 5
- with k = l = 20:
- whileloop → 42.55 it/s
- tensordot → 39.22 it/s

So it is indeed a great optimisation gain when using larger n_ξ_samples :)