rezazad68 / Dynamic-3D-Action-Recognition-on-RGB-D-Videos

Dynamic 3D Hand Gesture Recognition with Learning Spatio-Temporal Aggregation from Different Representation
MIT License
29 stars 3 forks source link

Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps

Dynamic 3D Human Hand Gesture Recogntin on RGB-D videos with State of the Art results on public data sets. This Method Learns Human Actions with Aggregating of Spatio-Temporal Description from different representation. If this code helps with your research please consider citing following paper:

R. Azad, M. Asadi, S. Kasaei, Sergio Escalera "Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps", IEEE Transaction on CSVT, 2018, download link.

Updates

  • September 2, 2017: First release (Complete implemenation for MSR Action 3D data set)
  • May 5, 2018: Complete implemenation for NTU RGB+D data set added. Accuracy rate 75.16 and 68.66 with deep and non deep features achieved respectively. It is worth to mention that our method achieved highest performance on depth data (75.16))
  • July 14, 2018: Paper link in IEEE Transaction on Circuits and Systems for Video Technology

    Prerequisties and Run

    This code has been implemented in Matlab 2016a and tested in both Linux (ubuntu) and Windows 10, though should be compatible with any OS running Matlab. following Environement and Library needed to run the code:

  • Matlab 2016
  • VL feat 0.9.20

    Run Demo

    Run the Main_MSRAction3D() for both feature extraction and classification of dynamic 3D action. The Main_MSRAction3D uses Step1_Extract_Featues for extracting spatio-temporal features from different represantion of 3D video and Step2_Description_Classification for aggregating of descriptions and classification phase. These two functions can be use seperetely too. Function such as Video Summarization(), Forward Bakward Motion(), Difference Forward Energy(), Temporal Sequence Generating(), Binary Weighted Mapping(), and extracting Regional LBP and HOG features() has been implemented in 'Video_Analyser' class. the Description_Classification class contains functions that related to Vlad representation and dimension reduction phase.

    Quick Overview

    Action and Hand Gesture Recognition

    Results

    For evaluating the performance of the proposed method, three public data sets has been considered. In bellow, results of using three different strategies for 3D Action recognition demonstrated.

  • Strategy 1 : Vlad Representation of Spatio-Temporal HOG Features from Different Representations
  • Strategy 2 : Vlad Representation of Spatio-Temporal LBP Features from Different Representations
  • Strategy 3 : Vlad Representation of Spatio-Temporal HOG+LBP Features from Different Representations
Data Set Strategy 1 Strategy 2 Strategy 3
MSR Gesture 3D 96.22 96.52 98.05
SKIG 95.0 95.60 97.31
MSR Action 3D 91.94 91.57 95.24
NTU RGB+D - - 75.16 deep

Effect of Choosing number of Visual Words on each data set has been illustrated in the followin table:

Selecting number of Visual Words on each data sets related to number of classes on each data set. In the following table these information has been evaluated.

Number of Visual Words 25 30 40 50 70 100 128
MSR Gesture 3D 98.05 97.50 97.50 96.94 96.66 96.38 96.38
SKIG 97.13 97.22 96.67 96.48 96.76 96.30 96.02
MSR Action 3D 92.31 93.04 93.04 94.14 95.24 93.77 93.77

Choosing appropriate number of PCA components

in the following table accuracy rate for choosing different amount of PCA components depicted.

PCA Components 70 100 130 160 190 220 250
MSR Gesture 3D 97.50 97.77 98.05 97.50 98.05 97.50 97.50
SKIG 96.57 97.13 97.22 97.31 97.31 96.94 97.31
MSR Action 3D 94.54 94.87 95.24 95.25 94.87 94.87 94.87

Query

All implementation done by Reza Azad. For any query please contact us for more information.

rezazad68@gmail.com