Runtime bottleneck of Seq2Seq

google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow

Apache License 2.0

5.6k stars 1.3k forks source link

Hi,

I just read your paper "Massive Exploration of Neural Machine Translation Architectures" and noticed that you made the following claim in Section 4.2: In our experiments, LSTM cells consistently outperformed GRU cells. Since the computational bottleneck in our architecture is the softmax operation we did not observe large difference in training speed between LSTM and GRU cells.

Could you please elaborate more on the bold part? Sorry but it seems to me that softmax is usually not the bottleneck in most network architecture since its computation can be parallelized.

Thank you so much.

google / seq2seq

Runtime bottleneck of Seq2Seq #319