milakov / nnForge

Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends
http://nnforge.org
177 stars 44 forks source link

get_forward_flops batch-mode #10

Closed soumith closed 10 years ago

soumith commented 10 years ago

Hey Maxim, I have two small questions.

(1) Is it possible to use the function .get_forward_flops in batch mode? (2) To use the CUDA backend, is it enough to just call nnforge::cuda::cuda::init();

I wrote a 30 line code snippet called benchmark (which you can place in the examples folder), https://github.com/soumith/convnet-benchmarks/blob/master/nnforge/benchmark/benchmark.cpp

Please let me know if the code looks right wrt CUDA being enabled and how I could benchmark it in batch-mode.

Something doesn't look right, as it prints out very poor numbers:

convolution_layer 3->96 11x11
:forward gflop/s: 0.96911
:backward gflop/s: 0.970447
:hessian  gflop/s: 0.970447
milakov commented 10 years ago

Hi Soumith,

get_forward_flops just returns the number of flops required to run a single sample through the layer. It doesn't do actual run, it doesn't measure anything.

soumith commented 10 years ago

oh i see, okay thanks, looks like it's going to be harder to do this than I thought :)

soumith commented 10 years ago

if you could donate a code snippet to do a quick benchmarking of a single module, that would be very helpful as there isn't much documentation.

if not I shall figure it out myself.

let me know, thanks.

milakov commented 10 years ago

I am sorry to say that it is not an easy task at all. The actual convolution is done in https://github.com/milakov/nnForge/blob/master/nnforge/cuda/convolution_layer_tester_cuda_kepler.cuh but you cannot just call enqueue_test and sync on the stream, you would need to do a lot of preparation calls.

soumith commented 10 years ago

okay, can I call it like in the galaxyzoo example's testing, but change the network to just be a single convolution layer?

milakov commented 10 years ago

You could try doing this, right. See how "validate" command works for it. You will probably need to fake training data with supervised_data_mem_reader class. You should also run hundreds or better thousands of samples to cover various overheads.

soumith commented 10 years ago

thank you, I shall do that!