justusschock / shapenet

PyTorch implementation of "Super-Realtime Facial Landmark Detection and Shape Fitting by Deep Regression of Shape Model Parameters" predicting facial landmarks with up to 400 FPS
https://shapenet.rtfd.io
GNU Affero General Public License v3.0
342 stars 59 forks source link

Prediction time compared with dlib #26

Closed An-Shank closed 5 years ago

An-Shank commented 5 years ago

Hi,

Previously, you had mentioned that your method is faster than dlib. However, when I downloaded the pretrained model and ran it on videos, I found that it was much slower than dlib. I used the same code (my own test app) and face detector (mobilenet SSD) to test shapenet and dlib.

justusschock commented 5 years ago

What is the time/frame you achieved for dlib and my model?

An-Shank commented 5 years ago

For dlib, it was 0.002 - 0.004 sec. For your model, it was 0.06 - 0.08 sec.

justusschock commented 5 years ago

Have you given it a bit of warmup? The problem ist, that when starting the model, a whole cuda context will be created, which is a relatively huge overhead, but must be done only once before the start. So this means, for a fair comparison you would not include this part in the benchmarking. Also you should set several benchmark flags in cudnn when benchmarking cuda code

An-Shank commented 5 years ago

I ran it on a CPU, not a GPU.

justusschock commented 5 years ago

Then this is the Problem! Dlibs code is optimized for CPU, mine is for GPU