Profile TF to see which ops are slow

Profiling tensorflow is complicated. Here are the links I have so far:

Following https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/README.md shows how to generate profiles
profiler-ui can then format the output of the above
it seems TensorBoard can also show some stats, but I haven't tried it

So extracting the info from profiler-ui gives the following timing logs:

GAE:
nw2vec:

In both cases most of the time is spent in Adam. There is no obvious bottleneck from what I saw, but it looks like the complexity of the model increases the complexity of the Adam part.

Option 1: I need something like snakeviz to go through this (it's unwieldy otherwise), so could do a quick prototype in Elm.

Option 2: since we don't use minibatching, disable it as much as possible and get performance gains from lessened code generality.

Option 3: put this on hold since we won't use it for an immediate paper, and move on to feature-network dependencies

ixxi-dante / an2vec

Profile TF to see which ops are slow #25