luoyetx / mini-caffe

Minimal runtime core of Caffe, Forward only, GPU support and Memory efficiency.
BSD 3-Clause "New" or "Revised" License
374 stars 151 forks source link

The performance of GPU version is much bad than CPU version #40

Closed ligongzheng closed 7 years ago

ligongzheng commented 7 years ago

Hi: I run a program which linked against mini-caffe and I compiled the mini-caffe in both cpu and gpu sytles. But the result is weird. It cost 47ms to finish the task in cpu version but cost 170ms in gpu version.Besides , some log message were printed to screen like: [14:05:15] /home/lgz/mini-caffe/src/syncedmem.cpp:275: [CPU] Requested 36.8 K, Get 49 K [14:05:15] /home/lgz/mini-caffe/src/syncedmem.cpp:275: [CPU] Requested 73.5 K, Get 98 K

although they get the same result,the performance of gpu version made me puzzled. is it normal ? I know much parallelism is needed to use gpu.If the scale of my problem is too small ?? how to explain this ?

ligongzheng commented 7 years ago

I found I didn't install the cuDNN but the mini-caffe running very well with GPU and I also checked it with nvidia-smi. Very embarrassed ! How you achieve it ? did you just use the cublas for gemm when no cuDNN was found ?

luoyetx commented 7 years ago

Please build with Release mode, the log message is only printed in Debug mode. More detail performance profile, please refer to this doc.

GPU performance should be the same in Debug and Release mode, maybe too much log message that slow down the program.

ligongzheng commented 7 years ago

But I didn't install cuDNN,does it affect the performance ?

luoyetx commented 7 years ago

Using cuDNN should run faster.