Open karan6181 opened 6 years ago
Hi,
I am running the lstm_benchmark.py test on CPU and multi GPU device(Amazon EC2) and I am not getting scaling as expected. Below are the pieces of information:
lstm_benchmark.py
Instance: P3.8xLarge(Amazon AWS) contains 4 GPUs
Virtual Env: TensorFlow(+Keras2) with Python2 (CUDA 9.0, V9.0.176)( source activate tensorflow_p27)
Python version: 2.7.14
Tensorflow version: 1.5.0
Keras version: 2.1.4
Deep Learning AMI: Amazon Linux
Modifications:
run_tf_backend.sh: Changed models='resnet50_eager' to models=‘lstm’
models='resnet50_eager'
models=‘lstm’
models/lstm_benchmark.py: changed self.num_samples = 1000 to self.num_samples = 50000
self.num_samples = 1000
self.num_samples = 50000
Command ran:
$ sh run_tf_backend.sh cpu_config $ sh run_tf_backend.sh gpu_config $ sh run_tf_backend.sh multi_gpu_config
Results:
The test doesn’t scale while using GPUs, means, speed/Epoch should be lower approx. by a factor of n where n is the number of GPUs.
n
Is this an expected behavior? Or Am I missing something here?
Thank-You!
/CC @anj-s
Hi,
I am running the
lstm_benchmark.py
test on CPU and multi GPU device(Amazon EC2) and I am not getting scaling as expected. Below are the pieces of information:Instance: P3.8xLarge(Amazon AWS) contains 4 GPUs
Virtual Env: TensorFlow(+Keras2) with Python2 (CUDA 9.0, V9.0.176)( source activate tensorflow_p27)
Python version: 2.7.14
Tensorflow version: 1.5.0
Keras version: 2.1.4
Deep Learning AMI: Amazon Linux
Modifications:
run_tf_backend.sh: Changed
models='resnet50_eager'
tomodels=‘lstm’
models/lstm_benchmark.py: changed
self.num_samples = 1000
toself.num_samples = 50000
Command ran:
Results:
The test doesn’t scale while using GPUs, means, speed/Epoch should be lower approx. by a factor of
n
wheren
is the number of GPUs.Is this an expected behavior? Or Am I missing something here?
Thank-You!