google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow
https://google.github.io/seq2seq/
Apache License 2.0
5.6k stars 1.3k forks source link

Runtime bottleneck of Seq2Seq #319

Open ArmageddonKnight opened 6 years ago

ArmageddonKnight commented 6 years ago

Hi,

I just read your paper "Massive Exploration of Neural Machine Translation Architectures" and noticed that you made the following claim in Section 4.2: In our experiments, LSTM cells consistently outperformed GRU cells. Since the computational bottleneck in our architecture is the softmax operation we did not observe large difference in training speed between LSTM and GRU cells.

Could you please elaborate more on the bold part? Sorry but it seems to me that softmax is usually not the bottleneck in most network architecture since its computation can be parallelized.

Thank you so much.

rpryzant commented 6 years ago

Pretty sure the softmax would be hard to parallelize in practice. By default, TensorFlow puts the softmax layer on the same GPU as the last RNN layer. This is to minimize data transfer between GPUs, which is actually slower than matrix multiplication.