Closed soumith closed 10 years ago
Hi Soumith,
get_forward_flops just returns the number of flops required to run a single sample through the layer. It doesn't do actual run, it doesn't measure anything.
oh i see, okay thanks, looks like it's going to be harder to do this than I thought :)
if you could donate a code snippet to do a quick benchmarking of a single module, that would be very helpful as there isn't much documentation.
if not I shall figure it out myself.
let me know, thanks.
I am sorry to say that it is not an easy task at all. The actual convolution is done in https://github.com/milakov/nnForge/blob/master/nnforge/cuda/convolution_layer_tester_cuda_kepler.cuh but you cannot just call enqueue_test and sync on the stream, you would need to do a lot of preparation calls.
okay, can I call it like in the galaxyzoo example's testing, but change the network to just be a single convolution layer?
You could try doing this, right. See how "validate" command works for it. You will probably need to fake training data with supervised_data_mem_reader class. You should also run hundreds or better thousands of samples to cover various overheads.
thank you, I shall do that!
Hey Maxim, I have two small questions.
(1) Is it possible to use the function .get_forward_flops in batch mode? (2) To use the CUDA backend, is it enough to just call nnforge::cuda::cuda::init();
I wrote a 30 line code snippet called benchmark (which you can place in the examples folder), https://github.com/soumith/convnet-benchmarks/blob/master/nnforge/benchmark/benchmark.cpp
Please let me know if the code looks right wrt CUDA being enabled and how I could benchmark it in batch-mode.
Something doesn't look right, as it prints out very poor numbers: