keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.58k stars 19.42k forks source link

LSTM slower on TITAN V vs TITAN Xp #8885

Closed laucheukhim closed 3 years ago

laucheukhim commented 6 years ago

LSTM training on Keras is significantly slower with TITAN V than that with TITAN Xp. I ran the IMDB Bidirectional LSTM example on multiple backends and found that TITAN V is at best 4% faster and at worst 147% slower than TITAN Xp.

OS: Ubuntu 16.04.3 LTS Drivers: NVIDIA 387.34, CUDA 9.1, cuDNN 7.0.5 Python: 3.6 Anaconda

I ran the example on multiple backends, and I also ran a modified one that uses CuDNNLSTM on Tensorflow. The following results are in seconds per epoch:

Xeon E5-1650 v4 @ 3.6GHz Xeon E5-1680 v3 @ 4.4GHz
TITAN V TITAN Xp CPU TITAN V TITAN Xp CPU
Keras 2.1.2 LSTM
Tensorflow 1.5.0rc0 161s 140s 65s 268s 265s 45s
Theano 1.0.1 176s 103s 125s 220s 89s 102s
CNTK 2.3.1 72s 74s 196s 71s 74s 188s
Keras 2.1.2 CuDNNLSTM
Tensorflow 1.5.0rc0 23s 8s 24s 11s
Keras 1.2.2 LSTM
MXNet 0.12.1 86s 66s 588s 105s 66s 620s

This looks like a continuation of the problem in #6640 that LSTM performance decreases with generation.

hiandersson commented 6 years ago

That example is not using Keras CuDNNLSTM layers (introduced in 2.0.9)? https://keras.io/layers/recurrent/#cudnnlstm https://github.com/keras-team/keras/releases

laucheukhim commented 6 years ago

@hiandersson I modified it, just to test, please read carefully. There are 2 separate tests I run. One with the original script, another one with LSTM replaced by CuDNNLSTM. Both tests indicate that TITAN V runs slower.

hiandersson commented 6 years ago

@laucheukhim read too fast (just looked at the example), my bad! Its interesting that CuDNNLSTM on TitanXp is 26 x faster than 'normal' LSTM. Is this the best overall performance you can get (comparing with Theano, CNTK)?

laucheukhim commented 6 years ago

@hiandersson Yes. Unfortunately, CuDNNLSTM does not support dropout and recurrent_dropout, so it cannot be used as a drop in replacement.

hiandersson commented 6 years ago

@laucheukhim is his a problem in Keras, tensorflow or with nvidia? is there any prognosis for a fix?

laucheukhim commented 6 years ago

@hiandersson I want a fix too. TITAN V is too expensive to perform badly like this.

I don't know what the cause is. I see similar degradation in pytorch too. This could be due to the lack of cuDNN autotune.

TITAN V runs 15% faster in games, 75% faster in lyra2rev2 crypto mining, but 60% slower in my LSTM model. This is a big disappointment for what is meant to be a machine learning card.

grafael commented 6 years ago

Just for curiosity, have you tried CNTK with CuDNNLSTM ? How was that?

laucheukhim commented 6 years ago

@grafael CuDNNLSTM is only available on the TensorFlow backend.

ghost commented 6 years ago

Thanks @laucheukhim