Dynamic 3D Human Hand Gesture Recogntin on RGB-D videos with State of the Art results on public data sets. This Method Learns Human Actions with Aggregating of Spatio-Temporal Description from different representation. If this code helps with your research please consider citing following paper:
R. Azad, M. Asadi, S. Kasaei, Sergio Escalera "Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps", IEEE Transaction on CSVT, 2018, download link.
Updates
- September 2, 2017: First release (Complete implemenation for MSR Action 3D data set)
- May 5, 2018: Complete implemenation for NTU RGB+D data set added. Accuracy rate 75.16 and 68.66 with deep and non deep features achieved respectively. It is worth to mention that our method achieved highest performance on depth data (75.16))
- July 14, 2018: Paper link in IEEE Transaction on Circuits and Systems for Video Technology
Prerequisties and Run
This code has been implemented in Matlab 2016a and tested in both Linux (ubuntu) and Windows 10, though should be compatible with any OS running Matlab. following Environement and Library needed to run the code:
- Matlab 2016
- VL feat 0.9.20
Run Demo
Run the
Main_MSRAction3D()
for both feature extraction and classification of dynamic 3D action. TheMain_MSRAction3D
usesStep1_Extract_Featues
for extracting spatio-temporal features from different represantion of 3D video andStep2_Description_Classification
for aggregating of descriptions and classification phase. These two functions can be use seperetely too. Function such asVideo Summarization()
,Forward Bakward Motion()
,Difference Forward Energy()
,Temporal Sequence Generating()
,Binary Weighted Mapping()
, and extractingRegional LBP and HOG features()
has been implemented in 'Video_Analyser' class. theDescription_Classification class
contains functions that related to Vlad representation and dimension reduction phase.
Quick Overview
Results
For evaluating the performance of the proposed method, three public data sets has been considered. In bellow, results of using three different strategies for 3D Action recognition demonstrated.
- Strategy 1 : Vlad Representation of Spatio-Temporal HOG Features from Different Representations
- Strategy 2 : Vlad Representation of Spatio-Temporal LBP Features from Different Representations
- Strategy 3 : Vlad Representation of Spatio-Temporal HOG+LBP Features from Different Representations
Data Set | Strategy 1 | Strategy 2 | Strategy 3 |
---|---|---|---|
MSR Gesture 3D | 96.22 | 96.52 | 98.05 |
SKIG | 95.0 | 95.60 | 97.31 |
MSR Action 3D | 91.94 | 91.57 | 95.24 |
NTU RGB+D | - | - | 75.16 deep |
Selecting number of Visual Words on each data sets related to number of classes on each data set. In the following table these information has been evaluated.
Number of Visual Words | 25 | 30 | 40 | 50 | 70 | 100 | 128 |
---|---|---|---|---|---|---|---|
MSR Gesture 3D | 98.05 | 97.50 | 97.50 | 96.94 | 96.66 | 96.38 | 96.38 |
SKIG | 97.13 | 97.22 | 96.67 | 96.48 | 96.76 | 96.30 | 96.02 |
MSR Action 3D | 92.31 | 93.04 | 93.04 | 94.14 | 95.24 | 93.77 | 93.77 |
in the following table accuracy rate for choosing different amount of PCA components depicted.
PCA Components | 70 | 100 | 130 | 160 | 190 | 220 | 250 |
---|---|---|---|---|---|---|---|
MSR Gesture 3D | 97.50 | 97.77 | 98.05 | 97.50 | 98.05 | 97.50 | 97.50 |
SKIG | 96.57 | 97.13 | 97.22 | 97.31 | 97.31 | 96.94 | 97.31 |
MSR Action 3D | 94.54 | 94.87 | 95.24 | 95.25 | 94.87 | 94.87 | 94.87 |
All implementation done by Reza Azad. For any query please contact us for more information.
rezazad68@gmail.com