Real-time demos that use deep convolutional neural networks to classify and caption what they see in real-time from a webcam stream.
All demos use CPU, but it's trivial to fix them to work with CUDA or OpenCL.
There's a Docker image to make installation & experiments easier.
Otherwise...
Quick install on OS X:
brew instal opencv3 --with-contrib
OpenCV_DIR=/usr/local/Cellar/opencv3/3.1.0/share/OpenCV luarocks install cv
brew install protobuf
luarocks install loadcaffe
In Linux you have to build OpenCV 3 manually. Follow the instructions in
The demo simply takes a central crop from a webcam and uses a small ImageNet classification pretrained network to classify what it see on it. top-5 predicted classes are shown on top, the top one is the most probable.
Run as th demo.lua
Example:
This demo uses two networks described here http://www.openu.ac.il/home/hassner/projects/cnn_agegender/ to predict age and gender of the faces that it finds with a simple cascade detector.
Run as
th demo.lua video_source [path-to-'haarcascade_frontalface_default.xml']
Where video_source
is camera
or path to a video file, and the second argument is optional.
IMAGINE Lab gives an example:
This demo uses NeuralTalk2 captioning code from Andrej Karpathy: https://github.com/karpathy/neuraltalk2
The code captions live webcam demo. Follow the installation instructions at https://github.com/karpathy/neuraltalk2 first and then run the demo as:
th videocaptioning.lua -gpuid -1 -model model_id1-501-1448236541_cpu.t7
Caption is displayed on top:
Check https://github.com/DmitryUlyanov/texture_nets
2016 Sergey Zagoruyko and Egor Burkov
Thanks to VisionLabs for putting up https://github.com/VisionLabs/torch-opencv bindings!