fix caffe time command.

When running the caffe time command, force kernels to finish executing before measuring the elapsed time.

Before making this change, I was seeing incorrect per-layer benchmark times.

An easy way to see is to run: build/tools/caffe time -model models/bvlc_alexnet/deploy.prototxt -gpu=0

Without this change, one sees times such as this:

I0618 03:19:15.426265 17120 caffe.cpp:273]      conv5   forward: 2.62356 ms.
I0618 03:19:15.426286 17120 caffe.cpp:276]      conv5   backward: 16.4404 ms.
I0618 03:19:15.426303 17120 caffe.cpp:273]      relu5   forward: 18.7961 ms.
I0618 03:19:15.426319 17120 caffe.cpp:276]      relu5   backward: 0.00024 ms.

No way that relu takes longer than convolution layer. And the forward and backward for a given layer should be close to the same.

After this change, I see:

I0618 07:13:19.637118 23689 caffe.cpp:275]      conv5   forward: 21.761 ms.
I0618 07:13:19.637135 23689 caffe.cpp:278]      conv5   backward: 27.8604 ms.
I0618 07:13:19.637151 23689 caffe.cpp:275]      relu5   forward: 0.13252 ms.
I0618 07:13:19.637167 23689 caffe.cpp:278]      relu5   backward: 0.01498 ms.

Which is much more sensible.

naibaf7 / caffe

fix caffe time command. #1