Closed kshmelkov closed 8 years ago
Sorry, did I miss a point why theano benchmarks are irrelevant compare to torch and tensorflow?
Hi Konstantin, no you did not, it is plainly my fault for missing the notification -- terribly sorry for the delay.
What are the numbers like on your machine?
I didn't run them on GPU, only on CPU to be sure everything works. Hardware setup I have access to is quite different from yours to have comparable numbers.
okay, are there best practices wrt Theano flags and command line arguments I have to use to make sure I get maximum performance? Does theano enable cudnn by default or do I need any flags?
It should detect cudnn automatically and be reasonably fast by default. Although I suggest following ~/.theanorc
config:
[global]
floatX = float32
mode=FAST_RUN
device=gpu0
force_device=True
[dnn]
enabled=True
[cuda]
root=/path/to/cuda
[lib]
cnmem=0.9
[nvcc]
fastmath=True
It makes theano fail if it can't attach to gpu or find cudnn.
Which cudnn algo is used by other framework? They can be changed by a few Theano flags. If you want Theano to time each (via cudnn) and pick the fastest, you can use this Theano flags:
dnn.conv.algo_fwd=time_once dnn.conv.algo_bwd_filter=time_once dnn.conv.algo_bwd_data=time_once
(add that in your .theanorc)
[dnn.conv] algo_fwd=time_once algo_bwd_filter=time_once algo_bwd_data=time_once
That flag also allow to use swap the default algo used by cudnn to other values.
On Wed, Apr 13, 2016 at 2:32 PM, Konstantin Shmelkov < notifications@github.com> wrote:
It should detect cudnn automatically and be reasonably fast by default. Although I suggest following ~/.theanorc config:
[global] floatX = float32 mode=FAST_RUN device=gpu0 force_device=True
[dnn] enabled=True
[cuda] root=/path/to/cuda
[lib] cnmem=0.9
[nvcc] fastmath=True
It makes theano fail if it can't attach to gpu or find cudnn.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/soumith/convnet-benchmarks/pull/94#issuecomment-209585222
I managed to get Titan X today. Here are numbers I got with suggested time_once
option:
$ python benchmark_imagenet.py -a alexnet -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.054 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.115 +/- 0.001 sec / batch
$ python benchmark_imagenet.py -a overfeat -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.137 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.400 +/- 0.003 sec / batch
$ python benchmark_imagenet.py -a googlenet -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.170 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.536 +/- 0.003 sec / batch
$ python benchmark_imagenet.py -a vgg -B 64
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.212 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.684 +/- 0.003 sec / batch
Hello,
I am interested in comparing performance of theano against tensorflow and torch. I have ported tensorflow implementation in theano/lasagne. Mostly it is line to line correspondance. I am slightly confused by stat calculations of time measurements, so I replaced them by numpy stats.