Implementing Object detection on Android with NCNN

tdurand commented 3 years ago

Starting to gathering some research / progress on github issues.. For future reference + help me to organize craziness in my head.

After benchmarking plenty of example apps / framework etc etc ... I came the conclusion at this time of writing / state of my knowledge that the fastest and most portable way to run YOLO on Android (and iOS if Android app sucessfull) is to use the NCNN framework from Tencent: https://github.com/Tencent/ncnn . Tencent being a company of the scale of Google in china.. The License is MIT.

There is a very nice example app showcasing several neural networks : https://github.com/cmdbug/YOLOv5_NCNN

Think of the NCNN as a framework to run a neural network (like darknet or tensorflow or pytorch).. but super optimized for Mobile phones CPU inference..

It is very very very optimized for android and iphone cpus ... I get 17 FPS for YOLOv4-tiny on a Xiaomi mi 8 (200€ phone).. for example with Tensorflow lite I think I get 2 FPS so this is crazy magic... but the interesting thing is that it also aims to support lots of platforms https://github.com/Tencent/ncnn#supported-platform-matrix , maybe to watch for the future also to run on Raspberrys, jetsons, web... Right now it is not very performance on GPUs.. but they are working on it. Another very impressive demo is that you can ship NCNN on the web via webassembly: https://github.com/nihui/ncnn-webassembly-nanodet (here it is running Nanonet, a new lightweight yolo like neural network that is almost as accurate but faster: https://github.com/RangiLyu/nanodet ..

The other super great news of being able to run YOLOv4-tiny on mobile is that you can train custom weights the same way and then convert them to NCNN "compatible" weights.. + we already know that YOLOv4-tiny is accurate enough.

The code from https://github.com/cmdbug/YOLOv5_NCNN is licensed GPLv3 but I think the actual YOLOv4-tiny C++ code that is the only part we need is available as MIT code here: https://github.com/Tencent/ncnn/blob/master/examples/yolov4.cpp .. So this is to be investigated to determine the future license of OpenDataCam mobile

Here is how to integrate it on Android:

NCNN comes as a binary file you include in the app
You can write C++ code that use NCNN .. eg: YOLOv4-tiny.cpp
You call this C++ code from Java / Kotlin
This gives you a YOLOv4.detect() function that takes in input an image and return a list of boxes
Then you hook the camera API of Android "to feed" frames to it
You do some "bridge" to be able to launch all this machinery from an Android Webview and have a way to render you UI on top of the camera preview + beeing able to get the frames data in real time..

This sounds super easy 😁, but I spent the last 2 weeks to really understand how to do this in practice.. by studying the code of the https://github.com/cmdbug/YOLOv5_NCNN , Android app development and the Cameras APIs..

The good news is that I have it mostly figured out 🎉.. I ended up doing my own "glue" code using the latest version of the CameraX api which simplifies a bit things.. and also supports things that are not supported in the example app.. like Camera/ Device orientation (was only working on portrait).. it is still a bit buggy though, the coordinates of the boxes are a bit off.. there is still some aspect ratio magic I need to figure out

I also put together a working Webview bridge using Capacitor (https://capacitorjs.com/) , and I'm able to render a HTML canvas on top of the Camera Preview which draw the boxes...

The whole things seems as performant as the Android native demo app.. so I guess the "core" Proof of concept is mostly under control now... Demo to try soon !

tdurand commented 3 years ago

Leaving this issue as documentation , also dropping this link here which discuss some of the existing neural network inference framework out there for mobile devices, maybe good to add to the future documentation: https://qengineering.eu/deep-learning-software-for-raspberry-pi-and-alternatives.html

tdurand commented 3 years ago

In order to avoid Licensing the mobile app at GPLv3 ( by using https://github.com/cmdbug/YOLOv5_NCNN ) , need to rework the YOLO implementation by using:

The example using OpenCV here: https://github.com/Tencent/ncnn/blob/master/examples/yolov4.cpp
And combine it with the android implementation of yolov5 : https://github.com/nihui/ncnn-android-yolov5/blob/master/app/src/main/jni/yolov5ncnn_jni.cpp

Which are BSD licenced.. compatible with MIT

tdurand commented 3 years ago

Some more notes on this:

Switched to a very optimized inference size which should boost FPS performance without degrating much perfs, as the Aspect ratio is 16/9, input in yolo is now set to 320 x 192 (needs to be multiple of 32) ( got the idea here https://github.com/ultralytics/yolov5/pull/1127#issuecomment-708263789 )
CameraX API notes: refactored it using code from : https://github.com/android/camera-samples/tree/main/CameraXBasic , https://github.com/android/camera-samples/tree/main/CameraXTfLite
NCNN upgrade done also

opendatacam / opendatacam-mobile

Implementing Object detection on Android with NCNN #2