add imagenet benchmarks for theano and lasagne

kshmelkov commented 8 years ago

Hello,

I am interested in comparing performance of theano against tensorflow and torch. I have ported tensorflow implementation in theano/lasagne. Mostly it is line to line correspondance. I am slightly confused by stat calculations of time measurements, so I replaced them by numpy stats.

kshmelkov commented 8 years ago

Sorry, did I miss a point why theano benchmarks are irrelevant compare to torch and tensorflow?

soumith commented 8 years ago

Hi Konstantin, no you did not, it is plainly my fault for missing the notification -- terribly sorry for the delay.

soumith commented 8 years ago

What are the numbers like on your machine?

kshmelkov commented 8 years ago

I didn't run them on GPU, only on CPU to be sure everything works. Hardware setup I have access to is quite different from yours to have comparable numbers.

soumith commented 8 years ago

okay, are there best practices wrt Theano flags and command line arguments I have to use to make sure I get maximum performance? Does theano enable cudnn by default or do I need any flags?

kshmelkov commented 8 years ago

It should detect cudnn automatically and be reasonably fast by default. Although I suggest following ~/.theanorc config:

[global]
floatX = float32
mode=FAST_RUN
device=gpu0
force_device=True

[dnn]
enabled=True

[cuda]
root=/path/to/cuda

[lib]
cnmem=0.9

[nvcc]
fastmath=True

It makes theano fail if it can't attach to gpu or find cudnn.

nouiz commented 8 years ago

Which cudnn algo is used by other framework? They can be changed by a few Theano flags. If you want Theano to time each (via cudnn) and pick the fastest, you can use this Theano flags:

dnn.conv.algo_fwd=time_once dnn.conv.algo_bwd_filter=time_once dnn.conv.algo_bwd_data=time_once

(add that in your .theanorc)

[dnn.conv] algo_fwd=time_once algo_bwd_filter=time_once algo_bwd_data=time_once

That flag also allow to use swap the default algo used by cudnn to other values.

On Wed, Apr 13, 2016 at 2:32 PM, Konstantin Shmelkov < notifications@github.com> wrote:

It should detect cudnn automatically and be reasonably fast by default. Although I suggest following ~/.theanorc config:

[global] floatX = float32 mode=FAST_RUN device=gpu0 force_device=True

[dnn] enabled=True

[cuda] root=/path/to/cuda

[lib] cnmem=0.9

[nvcc] fastmath=True

It makes theano fail if it can't attach to gpu or find cudnn.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/soumith/convnet-benchmarks/pull/94#issuecomment-209585222

kshmelkov commented 8 years ago

I managed to get Titan X today. Here are numbers I got with suggested time_once option:

$ python benchmark_imagenet.py -a alexnet -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.054 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.115 +/- 0.001 sec / batch

$ python benchmark_imagenet.py -a overfeat -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.137 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.400 +/- 0.003 sec / batch

$ python benchmark_imagenet.py -a googlenet -B 128
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.170 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.536 +/- 0.003 sec / batch

$ python benchmark_imagenet.py -a vgg -B 64
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Forward across 100 steps, 0.212 +/- 0.001 sec / batch
Forward-Backward across 100 steps, 0.684 +/- 0.003 sec / batch

soumith / convnet-benchmarks

add imagenet benchmarks for theano and lasagne #94