Closed jyegerlehner closed 9 years ago
Thanks. I originally had the finish() on the queue inside the layers. But then I figured out this might hurt asynchronous and multi-GPU execution in the future, so I excluded it. Your synchronization approach seems sensible and can be triggered when needed only, so I merged it, thanks.
When running the caffe time command, force kernels to finish executing before measuring the elapsed time.
Before making this change, I was seeing incorrect per-layer benchmark times.
An easy way to see is to run:
build/tools/caffe time -model models/bvlc_alexnet/deploy.prototxt -gpu=0
Without this change, one sees times such as this:
No way that relu takes longer than convolution layer. And the forward and backward for a given layer should be close to the same.
After this change, I see:
Which is much more sensible.