The speed comparation - Githubissues

llyydd007 commented 9 years ago

Has any body madea speed comparation with cuda convnet2 or caffe？I try to reimplement a paper. Using matconvnet is really wonderful.However,I have run 5000 epoch using GPU GTX 550 and 1 million backpropagation which costs me roughly 2 days.The author of the paper have run 8*10^8 backpropagation which costs roughly 3 days using GTX 770 GPU and cuda convnet2. Is there anything wrong? I think maybe there is something wrong with my code.But I still want to wonder the speed comparation with cuda convnet2 or caffe.Thanks!

vedaldi commented 9 years ago

You can expect MatConvNet to be a bit slower than Caffe (say 70-90% speed). Internally it is using very similar algorithms, but there is still the occasional bottleneck in MATLAB and its GPU support (I believe it will be sensibly accelerated in the next releases).

I am not sure how MatConvNet or Caffe would stack up to convnet2. As far as I remember, I have seen benchmarks where most implementations where within a factor of 2 in term of speed. According to your estimate, the speed difference would be 500 times, which sounds definitely incorrect.

Reasons that could explain the difference can be:

streaming images from disk is too slow (we use vl_imreadjpeg to read images faster in ImageNet data)
preprocessing images is the bottleneck; something in the @getBatch function
the batch size is too small. Many images (say 100) should be packed in a single array to achieve a good speed, particularly on a GPU
there could be a problem with using this code and you GPU (GTX 550 vs 770)

My suggestion is to try profiling your code (profiler on ; run code ; break after a while with Ctrl-C ; profile viewer) and see if you can identify an obvious bottleneck. If all goes well, most of the computation should be spend in vl_simplenn.

Of course, it is also possible that you may have hit a bug.

On 22 Dec 2014, at 10:05, llyydd007 notifications@github.com wrote:

Has any body madea speed comparation with cuda convnet2 or caffe？I try to reimplement a paper. Using matconvnet is really wonderful.However,I have run 5000 epoch using GPU GTX 550 and 1 million backpropagation which costs me roughly 2 days.The author of the paper have run 8*10^8 backpropagation which costs roughly 3 days using GTX 770 GPU and cuda convnet2. Is there anything wrong? I think maybe there is something wrong with my code.But I still want to wonder the speed comparation with cuda convnet2 or caffe.Thanks!

— Reply to this email directly or view it on GitHub https://github.com/vlfeat/matconvnet/issues/30.

llyydd007 commented 9 years ago

Thanks for your earnest reply.I have not used vl_simplenn,but I do have used vl_nnconv and vl_nnrelu, and I experimented with profiling the code,which I found 'wait(gpuDevice) ' two lines has taken up 79.8% and 14.6% respectively,and two lines vl_nnconv take up 2.3%(1.256 s) for the gradient calculation and 1.7% (0.966s)for the forward calculation.By the way,the batchsize is 100,input is 33 * 33 * 1 * 100.So I wonder whats happened in wait(gpuDevice). Again thanks for your reply! Best wishes!

llyydd007 commented 9 years ago

I have run the example code which has roughly the same percentage time cost for 'wait(gpuDevice) ' and vl_nnconv.And I write to the paper author, he explains that the backpropagation mentioned in the paper is the samples that training sees.That means the author of the paper have run 8*10^8 backpropagation which costs roughly 3 days using GTX 770 GPU and cuda convnet2.,I have run 5000 epoch using GPU GTX 550 and 1 million backpropagation which costs me roughly 2 days(It may be less for I stopped the program for watching the results).Considering the difference of GPU,I think the matconvnet is comparable with cuda convnet2 in speed.So It is a great tools and thank you again for this good job.It helps me a lot!

vedaldi commented 9 years ago

Glad to hear that! We recently observed nice speedups using cuDNN in beta9.

vlfeat / matconvnet

The speed comparation #30