tensorflow / fold

Deep learning with dynamic computation graphs in TensorFlow
Apache License 2.0
1.83k stars 266 forks source link

Fold & Eager #87

Closed MaksymDel closed 5 years ago

MaksymDel commented 6 years ago

Hi @delesley how does TF Fold compare to the recent Eager module?

delesley commented 6 years ago

TL/DR: if you're implementing a new model, use Eager.

Eager is the new python API for building dynamic computation graphs going forward, and one of the reasons why there haven't been many updates to TensorFlow Fold. The API for Eager is much cleaner than Fold. I haven't run proper benchmarks myself yet, so I'm not sure how performance compares. Tensorflow Fold builds a static graph, and implements dynamic batching, which can yield 100x speedups. Eager doesn't do many graph optimizations, and I'm not sure if they do dynamic batching yet, so performance may not yet be competitive with Fold. However, one of the big bottlenecks with the Fold API is that it often spends most of its time building schedules rather than actually executing tensor operations, so Eager may be faster on certain workloads. Eager also has a lot more development resources behind it, so its performance will continue to improve.

-DeLesley

On Fri, Nov 3, 2017 at 12:17 PM, Maksym Del notifications@github.com wrote:

Hi @delesley https://github.com/delesley how does TF Fold compare to the recent Eager https://research.googleblog.com/2017/10/eager-execution-imperative-define-by.html module?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/fold/issues/87, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGbTY1JgSBXUAK0DPC86w9VjsWNpGTFks5sy2bWgaJpZM4QRj3b .

-- DeLesley Hutchins | Software Engineer | delesley@google.com | 505-206-0315

Jabberwockyll commented 6 years ago

Hi @delesley,

I'm working on a problem where each of my training examples is of a different size (?xn) (specifically, each example is a protein composed of many amino acid residues with n features each). My network learns latent feature representations for each residue in an example (graph convolution), then downsamples each example to a common size, before final dense layer(s).

In vanilla TF, I can either:

  1. Stack all examples along the common dimension. In this I case, I have to separate them before the downsampling step. I can do this with tf.split(), but it requires me to always use the same size minibatch. This is annoying when testing, as the size of my test set isn't divisible by my minibatch size.
  2. Pad all examples with zeros to the size of the largest example, then construct a 3-d tensor. My examples vary widely in size, so this option is very memory-hungry and considerably slower.

It looks like I could use map blocks in TF Fold to operate on each example or use TF Eager to deconstruct the stacked minibatch (strategy 1) into a variable-length list. This is correct right?

So, my question is, would it still be better for me to go with Eager? It seems like Fold could be more efficient in this case, right?

delesley commented 6 years ago

You might be able to use the blocks library, but it's not very easy to use for graph convolutions. I would recommend using the Loom library instead, which is also part of TF Fold. The API for loom is very similar to TF Eager -- you simply traverse the graph in python, and invoke tensor operations imperatively. Unlike TF Eager, however, Loom implements dynamic batching.

What you do is implement your graph convolution by traversing the graph in the obvious way, as if you were using a batch size of 1. If you have a batch of 100 graphs, just traverse all of the graphs in the batch, one by one. TF Fold is lazy -- it doesn't actually evaluate any tensors until the traversal is complete. Once the traversal is complete, TF Fold will schedule all of the tensor operations, and dynamically batch together operations wherever it can. So basically, you don't have to worry about the batching.

-DeLesley

On Mon, Feb 26, 2018 at 1:11 PM, Jonathon Byrd notifications@github.com wrote:

Hi @delesley https://github.com/delesley,

I'm working on a problem where each of my training examples is of a different size (?xn) (specifically, each example is a protein composed of many amino acid residues with n features each). My network learns latent feature representations for each residue in an example (graph convolution), then downsamples each example to a common size, before final dense layer(s).

In vanilla TF, I can either:

  1. Stack all examples along the common dimension. In this I case, I have to separate them before the downsampling step. I can do this with tf.split(), but it requires me to always use the same size minibatch. This is annoying when testing, as the size of my test set isn't divisible by my minibatch size.
  2. Pad all examples with zeros to the size of the largest example, then construct a 3-d tensor. My examples vary widely in size, so this option is very memory-hungry and considerably slower.

It looks like I could use map blocks in TF Fold to operate on each example or use TF Eager to deconstruct the stacked minibatch (strategy 1) into a variable-length list. This is correct right?

So, my question is, would it still be better for me to go with Eager? It seems like Fold could be more efficient in this case, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/fold/issues/87#issuecomment-368651776, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGbTaJHEweQVWvonW4mu3EaadGw-VoIks5tYx3xgaJpZM4QRj3b .

-- DeLesley Hutchins | Software Engineer | delesley@google.com | 505-206-0315