simonwsw / deep-soli

Gesture Recognition Using Neural Networks with Google's Project Soli Sensor
MIT License
140 stars 52 forks source link

deep-soli

Gesture Recognition Using Neural Networks with Google's Project Soli Sensor

Update

Dataset and trained model are now available.

Introduction

This is the open source evaluation code base of our paper:

Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum
Saiwen Wang, Jie Song, Jamie Lien, Poupyrev Ivan, Otmar Hilliges
(link to the paper)

Please cite the paper if you find the code useful. Thank you. (BibTex)

This project uses Google's Project Soli sensor.

Soli is a new sensing technology that uses miniature radar to detect touchless gesture interactions.

Soli sensor image

Soli sensor technology works by emitting electromagnetic waves in a broad beam. Objects within the beam scatter this energy, reflecting some portion back towards the radar antenna. Properties of the reflected signal, such as energy, time delay, and frequency shift capture rich information about the object’s characteristics and dynamics, including size, shape, orientation, material, distance, and velocity.

Our paper uses a light-weight end-to-end trained Convolutional Neural Networks and Recurrent Neural Networks architecture, recognizes 11 in-air gestures with 87% per-frame accuracy, and can perform realtime predictions at 140Hz on commodity hardware. (link to the paper video)

Pre-request

Quick start

cd image
mkdir bin
make
python pre/main.py --op image --file [dataset folder]
--target [target image folder] --channel 4 --originsize 32 --outsize 32
python pre/main.py --op mean --file [image folder]
--target [mean file name] --channel 4 --outsize 32
th net/main.lua --file [image folder] --list [train/test sequence split file]
--load [model file] --inputsize 32 --inputch 4 --label 13 --datasize 32
--datach 4 --batch 16 --maxseq 40 --cuda --cudnn

Dataset

# Demo code to extract data in python
import h5py

use_channel = 0
with h5py.File(file_name, 'r') as f:
    # Data and label are numpy arrays
    data = f['ch{}'.format(use_channel)][()]
    label = f['label'][()]

gesture

Pre-trained model

loadFile = 'uni_image_np_50.t7'
net = torch.load(loadFile)
print(net)

Comments on the code base

This is a simplified version of the original code base we used for all the experiments in the paper. The complex Torch based class hierarchies in the net reflects varies model architectures we tried during the experiments. For simplicity, we only make the evaluation part public. The model detail can be found both in the paper and the model file.

License

This project is licensed under the terms of the MIT license.