LSTM slower on TITAN V vs TITAN Xp

keras-team / keras

Deep Learning for humans

http://keras.io/

Apache License 2.0

61.58k stars 19.42k forks source link

LSTM slower on TITAN V vs TITAN Xp #8885

Closed laucheukhim closed 3 years ago

laucheukhim commented 6 years ago

LSTM training on Keras is significantly slower with TITAN V than that with TITAN Xp. I ran the IMDB Bidirectional LSTM example on multiple backends and found that TITAN V is at best 4% faster and at worst 147% slower than TITAN Xp.

OS: Ubuntu 16.04.3 LTS Drivers: NVIDIA 387.34, CUDA 9.1, cuDNN 7.0.5 Python: 3.6 Anaconda

I ran the example on multiple backends, and I also ran a modified one that uses CuDNNLSTM on Tensorflow. The following results are in seconds per epoch:

	Xeon E5-1650 v4 @ 3.6GHz			Xeon E5-1680 v3 @ 4.4GHz
	TITAN V	TITAN Xp	CPU	TITAN V	TITAN Xp	CPU
Keras 2.1.2 LSTM
Tensorflow 1.5.0rc0	161s	140s	65s	268s	265s	45s
Theano 1.0.1	176s	103s	125s	220s	89s	102s
CNTK 2.3.1	72s	74s	196s	71s	74s	188s
Keras 2.1.2 CuDNNLSTM
Tensorflow 1.5.0rc0	23s	8s		24s	11s
Keras 1.2.2 LSTM
MXNet 0.12.1	86s	66s	588s	105s	66s	620s

This looks like a continuation of the problem in #6640 that LSTM performance decreases with generation.

hiandersson commented 6 years ago

That example is not using Keras CuDNNLSTM layers (introduced in 2.0.9)? https://keras.io/layers/recurrent/#cudnnlstm https://github.com/keras-team/keras/releases

laucheukhim commented 6 years ago

@hiandersson I modified it, just to test, please read carefully. There are 2 separate tests I run. One with the original script, another one with LSTM replaced by CuDNNLSTM. Both tests indicate that TITAN V runs slower.

hiandersson commented 6 years ago

@laucheukhim read too fast (just looked at the example), my bad! Its interesting that CuDNNLSTM on TitanXp is 26 x faster than 'normal' LSTM. Is this the best overall performance you can get (comparing with Theano, CNTK)?

laucheukhim commented 6 years ago

@hiandersson Yes. Unfortunately, CuDNNLSTM does not support dropout and recurrent_dropout, so it cannot be used as a drop in replacement.

hiandersson commented 6 years ago

@laucheukhim is his a problem in Keras, tensorflow or with nvidia? is there any prognosis for a fix?

laucheukhim commented 6 years ago

@hiandersson I want a fix too. TITAN V is too expensive to perform badly like this.

I don't know what the cause is. I see similar degradation in pytorch too. This could be due to the lack of cuDNN autotune.

TITAN V runs 15% faster in games, 75% faster in lyra2rev2 crypto mining, but 60% slower in my LSTM model. This is a big disappointment for what is meant to be a machine learning card.

grafael commented 6 years ago

Just for curiosity, have you tried CNTK with CuDNNLSTM ? How was that?

laucheukhim commented 6 years ago

@grafael CuDNNLSTM is only available on the TensorFlow backend.

ghost commented 6 years ago

Thanks @laucheukhim