We introduce a large image dataset HaGRIDv2 (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems. You can use it for image classification or image detection tasks. Proposed dataset allows to build HGR systems, which can be used in video conferencing services (Zoom, Skype, Discord, Jazz etc.), home automation systems, the automotive sector, etc.
HaGRIDv2 size is 1.5T and dataset contains 1,086,158 FullHD RGB images divided into 33 classes of gestures and a new separate "no_gesture" class, containing domain-specific natural hand postures. Also, some images have no_gesture
class if there is a second gesture-free hand in the frame. This extra class contains 2,164 samples. The data were split into training 76%, 9% validation and testing 15% sets by subject user_id
, with 821,458 images for train, 99,200 images for validation and 165,500 for test.
The dataset contains 65,977 unique persons and at least this number of unique scenes. The subjects are people over 18 years old. The dataset was collected mainly indoors with considerable variation in lighting, including artificial and natural light. Besides, the dataset includes images taken in extreme conditions such as facing and backing to a window. Also, the subjects had to show gestures at a distance of 0.5 to 4 meters from the camera.
Example of sample and its annotation:
For more information see our arxiv paper [TBA]().
2024/09/24
: We release HaGRIDv2. 🙏
no_gesture
contains 200,390 bounding boxesuser_id
2023/09/21
: We release HaGRID 2.0. ✌️
no_gesture
contains 120,105 samplesuser_id
2022/06/16
: HaGRID (Initial Dataset) 💪
no_gesture
contains 123,589 samplesuser_id
Clone and install required python packages:
git clone https://github.com/hukenovs/hagrid.git
# or mirror link:
cd hagrid
# Create virtual env by conda or venv
conda create -n gestures python=3.11 -y
conda activate gestures
# Install requirements
pip install -r requirements.txt
We split the train dataset into 34 archives by gestures because of the large size of data. Download and unzip them from the following links:
Gesture | Size | Gesture | Size | Gesture | Size |
---|---|---|---|---|---|
call |
37.2 GB | peace |
41.4 GB | grabbing |
48.7 GB |
dislike |
40.9 GB | peace_inverted |
40.5 GB | grip |
48.6 GB |
fist |
42.3 GB | rock |
41.7 GB | hand_heart |
39.6 GB |
four |
43.1 GB | stop |
41.8 GB | hand_heart2 |
42.6 GB |
like |
42.2 GB | stop_inverted |
41.4 GB | holy |
52.7 GB |
mute |
43.2 GB | three |
42.2 GB | little_finger |
48.6 GB |
ok |
42.5 GB | three2 |
40.2 GB | middle_finger |
50.5 GB |
one |
42.7 GB | two_up |
41.8 GB | point |
50.4 GB |
palm |
43.0 GB | two_up_inverted |
40.9 GB | take_picture |
37.3 GB |
three3 |
54 GB | three_gun |
50.1 GB | thumb_index |
62.8 GB |
thumb_index2 |
24.8 GB | timeout |
39.5 GB | xsign |
51.3 GB |
no_gesture |
493.9 MB |
dataset
annotations: annotations
HaGRIDv2 512px - lightweight version of the full dataset with min_side = 512p
119.4 GB
or by using python script
python download.py --save_path <PATH_TO_SAVE> \
--annotations \
--dataset
Run the following command with key --dataset
to download dataset with images. Download annotations for selected stage by --annotations
key.
usage: download.py [-h] [-a] [-d] [-t TARGETS [TARGETS ...]] [-p SAVE_PATH]
Download dataset...
optional arguments:
-h, --help show this help message and exit
-a, --annotations Download annotations
-d, --dataset Download dataset
-t TARGETS [TARGETS ...], --targets TARGETS [TARGETS ...]
Target(s) for downloading train set
-p SAVE_PATH, --save_path SAVE_PATH
Save path
After downloading, you can unzip the archive by running the following command:
unzip <PATH_TO_ARCHIVE> -d <PATH_TO_SAVE>
The structure of the dataset is as follows:
├── hagrid_dataset <PATH_TO_DATASET_FOLDER>
│ ├── call
│ │ ├── 00000000.jpg
│ │ ├── 00000001.jpg
│ │ ├── ...
├── hagrid_annotations
│ ├── train <PATH_TO_JSON_TRAIN>
│ │ ├── call.json
│ │ ├── ...
│ ├── val <PATH_TO_JSON_VAL>
│ │ ├── call.json
│ │ ├── ...
│ ├── test <PATH_TO_JSON_TEST>
│ │ ├── call.json
│ │ ├── ...
We provide some models pre-trained on HaGRIDv2 as the baseline with the classic backbone architectures for gesture classification, gesture detection and hand detection.
Gesture Detectors | mAP |
---|---|
YOLOv10x | 89.4 |
YOLOv10n | 88.2 |
SSDLiteMobileNetV3Large | 72.7 |
In addition, if you need to detect hands, you can use YOLO detection models, pre-trained on HaGRIDv2
Hand Detectors | mAP |
---|---|
YOLOv10x | 88.8 |
YOLOv10n | 87.9 |
However, if you need a single gesture, you can use pre-trained full frame classifiers instead of detectors. To use full frame models, remove the no_gesture class
Full Frame Classifiers | F1 Gestures |
---|---|
MobileNetV3_small | 86.7 |
MobileNetV3_large | 93.4 |
VitB16 | 91.7 |
ResNet18 | 98.3 |
ResNet152 | 98.6 |
ConvNeXt base | 96.4 |
python demo.py -p <PATH_TO_CONFIG> --landmarks
python demo_ff.py -p <PATH_TO_CONFIG>
The annotations consist of bounding boxes of hands and gestures in COCO format [top left X position, top left Y position, width, height]
with gesture labels. We provide user_id
field that will allow you to split the train / val / test dataset yourself, as well as a meta-informations contains automatically annotated age, gender and race.
"04c49801-1101-4b4e-82d0-d4607cd01df0": {
"bboxes": [
[0.0694444444, 0.3104166667, 0.2666666667, 0.2640625],
[0.5993055556, 0.2875, 0.2569444444, 0.2760416667]
],
"labels": [
"thumb_index2",
"thumb_index2"
],
"united_bbox": [
[0.0694444444, 0.2875, 0.7868055556, 0.2869791667]
],
"united_label": [
"thumb_index2"
],
"user_id": "2fe6a9156ff8ca27fbce8ada318c592b",
"hand_landmarks": [
[
[0.37233507701702123, 0.5935673528948108],
[0.3997604810145188, 0.5925499847441514],
...
],
[
[0.37388438145820907, 0.47547576284667353],
[0.39460467775730607, 0.4698847093520443],
...
]
]
"meta": {
"age": [24.41],
"gender": ["female"],
"race": ["White"]
}
[top left X pos, top left Y pos, width, height]
like
, stop
, no_gesture
Object | Train | Val | Test | Total |
---|---|---|---|---|
gesture | 980 924 | 120 003 | 200 006 | 1 300 933 |
no gesture | 154 403 | 19 411 | 29 386 | 203 200 |
total boxes | 1 135 327 | 139 414 | 229 392 | 1 504 133 |
Object | Train | Val | Test | Total |
---|---|---|---|---|
Total hands with landmarks | 983 991 | 123 230 | 201 131 | 1 308 352 |
This work is licensed under a variant of Creative Commons Attribution-ShareAlike 4.0 International License.
Please see the specific license.
You can cite the paper using the following BibTeX entry:
@InProceedings{Kapitanov_2024_WACV,
author = {Kapitanov, Alexander and Kvanchiani, Karina and Nagaev, Alexander and Kraynov, Roman and Makhliarchuk, Andrei},
title = {HaGRID -- HAnd Gesture Recognition Image Dataset},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024},
pages = {4572-4581}
}