naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
86 stars 20 forks source link

fix caffe time command. #1

Closed jyegerlehner closed 9 years ago

jyegerlehner commented 9 years ago

When running the caffe time command, force kernels to finish executing before measuring the elapsed time.

Before making this change, I was seeing incorrect per-layer benchmark times.

An easy way to see is to run: build/tools/caffe time -model models/bvlc_alexnet/deploy.prototxt -gpu=0

Without this change, one sees times such as this:

I0618 03:19:15.426265 17120 caffe.cpp:273]      conv5   forward: 2.62356 ms.
I0618 03:19:15.426286 17120 caffe.cpp:276]      conv5   backward: 16.4404 ms.
I0618 03:19:15.426303 17120 caffe.cpp:273]      relu5   forward: 18.7961 ms.
I0618 03:19:15.426319 17120 caffe.cpp:276]      relu5   backward: 0.00024 ms.

No way that relu takes longer than convolution layer. And the forward and backward for a given layer should be close to the same.

After this change, I see:

I0618 07:13:19.637118 23689 caffe.cpp:275]      conv5   forward: 21.761 ms.
I0618 07:13:19.637135 23689 caffe.cpp:278]      conv5   backward: 27.8604 ms.
I0618 07:13:19.637151 23689 caffe.cpp:275]      relu5   forward: 0.13252 ms.
I0618 07:13:19.637167 23689 caffe.cpp:278]      relu5   backward: 0.01498 ms.

Which is much more sensible.

naibaf7 commented 9 years ago

Thanks. I originally had the finish() on the queue inside the layers. But then I figured out this might hurt asynchronous and multi-GPU execution in the future, so I excluded it. Your synchronization approach seems sensible and can be triggered when needed only, so I merged it, thanks.