rykov8 / ssd_keras

Port of Single Shot MultiBox Detector to Keras
MIT License
1.1k stars 553 forks source link

Slow detection #1

Open MrXu opened 7 years ago

MrXu commented 7 years ago

Great work! Thanks a lot!

The detection takes around 2 second per image on a mac using only CPU. It's quite different from the performance of test provided in the paper. Apart from hardware, is it possible that it's caused by the overhead of Keras? Also, may I ask is it possible to shrink the network somehow? Thank you.

xindongzhang commented 7 years ago

The performance of inference phase in this paper is conducted using NVIDIA K40 GPUs, and the input is a batch of images. You can replace the vgg module with AlexNet, AlexNet is smaller than vgg.

rykov8 commented 7 years ago

@xindongzhang thanks for your comment, but I believe, that the authors state the following: We measure the speed with batch size 8 using Titan X and cuDNN v4 with Intel Xeon E5-2667v3@3.20GHz. However, it doesn't matter, they report performance on GPU.

@MrXu, I measured forward pass of my PC with Titan and for 5 pictures (like in SSD.ipynb) I got the results that are in the screenshot. screen shot 2016-11-12 at 12 10 18 pm This means that it takes around 50 ms per image to get the prediction. I haven't measured the original caffe code, but I'm sure that my NMS implementation is slower, than the original one. Moreover, some custom layers can also be not very efficient, this is the thing to improve in the future, because I also need real-time performance on GPU for my problem. Any ideas how to speed up the code are welcome! I've also heard that sometimes Keras is slower, than other frameworks, but I can't bear Caffe, so, for me Keras is the best choice.

As for network shrinkage, apart from replacing vgg with AlexNet (after this step you will have to retrain the net), you can think about scales of your detection. For example, if you know, that you won't have big objects on your images, you, probably, don't need final layers and can delete them.

MrXu commented 7 years ago

@xindongzhang thanks for the suggestion. I may prefer to avoid retrain the model. @rykov8 , thanks for the clarification. I do read that Keras is slower than other framework like TensorLayer or TFlearn. I am trying to run the prediction on Rpi, seems achieving real-time detection with only CPU is really hard...

rykov8 commented 7 years ago

@MrXu as for training, I'm working on this part, hope to release the code this week. I also had to change some things in the architecture in order to be able to train the net. However, I will test it only for my problem, but I try to implement training as universal, as possible. Hope, it will be useful. As for real-time detection, I'm quite sure, that unfortunately, it is nearly impossible nowadays to run deep nets on CPU with real-time performance. If you need real-time on CPU you might consider simpler methods with loss of quality.

ManjeeraJagiri commented 6 years ago

@rykov8 , Thanks for the code!. It works perfectly. I wanted to know if you tried out anything to improve the fps for real time detection. I have been trying to implement multithreading, but no luck so far.