Keras+Tensorflow Benchmark on Synthetic LSTM Dataset

Hi,

I am running the lstm_benchmark.py test on CPU and multi GPU device(Amazon EC2) and I am not getting scaling as expected. Below are the pieces of information:

Instance: P3.8xLarge(Amazon AWS) contains 4 GPUs

Virtual Env: TensorFlow(+Keras2) with Python2 (CUDA 9.0, V9.0.176)( source activate tensorflow_p27)

Python version: 2.7.14

Tensorflow version: 1.5.0

Keras version: 2.1.4

Deep Learning AMI: Amazon Linux

Modifications:

run_tf_backend.sh: Changed models='resnet50_eager' to models=‘lstm’

models/lstm_benchmark.py: changed self.num_samples = 1000 to self.num_samples = 50000

Command ran:

$ sh run_tf_backend.sh cpu_config
$ sh run_tf_backend.sh gpu_config
$ sh run_tf_backend.sh multi_gpu_config

Results:

Instance	GPUs	Backend	Batch size	Data Set	Training Method	Speed/Epoch (Lower is better)	Unroll Type	No. of samples	Memory(MiB)
p3.8xLarge	0	Tensorflow	128	Synthetic	fit()	18sec - 363us/step	unroll=False	50000	0
p3.8xLarge	1	Tensorflow	128	Synthetic	fit()	18sec - 362us/step	unroll=False	50000	15360
p3.8xLarge	4	Tensorflow	128	Synthetic	fit()	33sec - 651us/step	unroll=False	50000	15410

The test doesn’t scale while using GPUs, means, speed/Epoch should be lower approx. by a factor of n where n is the number of GPUs.

Is this an expected behavior? Or Am I missing something here?

Thank-You!

tensorflow / benchmarks

Keras+Tensorflow Benchmark on Synthetic LSTM Dataset #157