soumith / convnet-benchmarks

Easy benchmarking of all publicly accessible implementations of convnets
MIT License
2.68k stars 577 forks source link

Is your setup of Caffe-Greentea optimal ? #70

Open NH89 opened 8 years ago

NH89 commented 8 years ago

I see you are getting markedly slow results with Caffe-Greentea. Which backends are you using, and do you know if they are the best available ?

In ( @naibaf7 ) Fabian Tschopp's tech report http://arxiv.org/pdf/1509.03371.pdf table 6.10 he shows 20x variation in performance depending on which manufacturer's libraries are used.

naibaf7 commented 8 years ago

@NH89 Greentea/OpenCL is really slow for CNNs with batched data because of overhead and inefficiency in the Matrix-Matrix multiplications used for convolutions, especially when they are smaller. This benchmark also uses the ViennaCL library, the clBLAS AMD library could be a bit faster as well (can be selected at compile time).

However, to be really up to speed, there need to be vendor and hardware specific convolution libraries such as cuDNN.

AMD has an OpenCL branch (https://github.com/amd/OpenCL-caffe) where they are alternatively unwrapping the batch into one large Matrix-Matrix multiplication. Very memory inefficient compared to cuDNN, but almost as fast.

In the same technical report you can read up that with interleaved, pixelwise classification data (which causes large matrix-matrix multiplications and thus a higher efficiency and no batches) are comparably fast to CUDA.

NH89 commented 8 years ago

@naibaf7 Thanks, you saved me from making an expensive error :-) Thank you also for creating Greentea.

naibaf7 commented 8 years ago

@NH89

No problem. Probably the OpenCL approaches will catch up with CUDA solutions during Q2/3 next year, as major developments are going on by both AMD and Intel.

For my projects in biomedical image segmentation, the OpenCL solution is already speed competitive though.