hukenovs / hagrid

HAnd Gesture Recognition Image Dataset
https://arxiv.org/abs/2206.08219
608 stars 94 forks source link
bounding-boxes computer-vision dataset deep-learning gesture-recognition gestures-classification hands image-classification

HaGRID - HAnd Gesture Recognition Image Dataset

hagrid

We introduce a large image dataset HaGRIDv2 (HAnd Gesture Recognition Image Dataset) for hand gesture recognition (HGR) systems. You can use it for image classification or image detection tasks. Proposed dataset allows to build HGR systems, which can be used in video conferencing services (Zoom, Skype, Discord, Jazz etc.), home automation systems, the automotive sector, etc.

HaGRIDv2 size is 1.5T and dataset contains 1,086,158 FullHD RGB images divided into 33 classes of gestures and a new separate "no_gesture" class, containing domain-specific natural hand postures. Also, some images have no_gesture class if there is a second gesture-free hand in the frame. This extra class contains 2,164 samples. The data were split into training 76%, 9% validation and testing 15% sets by subject user_id, with 821,458 images for train, 99,200 images for validation and 165,500 for test.

gestures

The dataset contains 65,977 unique persons and at least this number of unique scenes. The subjects are people over 18 years old. The dataset was collected mainly indoors with considerable variation in lighting, including artificial and natural light. Besides, the dataset includes images taken in extreme conditions such as facing and backing to a window. Also, the subjects had to show gestures at a distance of 0.5 to 4 meters from the camera.

Example of sample and its annotation:

example

For more information see our arxiv paper [TBA]().

🔥 Changelog

Installation

Clone and install required python packages:

git clone https://github.com/hukenovs/hagrid.git
# or mirror link:
cd hagrid
# Create virtual env by conda or venv
conda create -n gestures python=3.11 -y
conda activate gestures
# Install requirements
pip install -r requirements.txt

Downloads

We split the train dataset into 34 archives by gestures because of the large size of data. Download and unzip them from the following links:

Dataset

Gesture Size Gesture Size Gesture Size
call 37.2 GB peace 41.4 GB grabbing 48.7 GB
dislike 40.9 GB peace_inverted 40.5 GB grip 48.6 GB
fist 42.3 GB rock 41.7 GB hand_heart 39.6 GB
four 43.1 GB stop 41.8 GB hand_heart2 42.6 GB
like 42.2 GB stop_inverted 41.4 GB holy 52.7 GB
mute 43.2 GB three 42.2 GB little_finger 48.6 GB
ok 42.5 GB three2 40.2 GB middle_finger 50.5 GB
one 42.7 GB two_up 41.8 GB point 50.4 GB
palm 43.0 GB two_up_inverted 40.9 GB take_picture 37.3 GB
three3 54 GB three_gun 50.1 GB thumb_index 62.8 GB
thumb_index2 24.8 GB timeout 39.5 GB xsign 51.3 GB
no_gesture 493.9 MB

dataset annotations: annotations

HaGRIDv2 512px - lightweight version of the full dataset with min_side = 512p 119.4 GB

or by using python script

python download.py --save_path <PATH_TO_SAVE> \
                   --annotations \
                   --dataset

Run the following command with key --dataset to download dataset with images. Download annotations for selected stage by --annotations key.

usage: download.py [-h] [-a] [-d] [-t TARGETS [TARGETS ...]] [-p SAVE_PATH]

Download dataset...

optional arguments:
  -h, --help            show this help message and exit
  -a, --annotations     Download annotations
  -d, --dataset         Download dataset
  -t TARGETS [TARGETS ...], --targets TARGETS [TARGETS ...]
                        Target(s) for downloading train set
  -p SAVE_PATH, --save_path SAVE_PATH
                        Save path

After downloading, you can unzip the archive by running the following command:

unzip <PATH_TO_ARCHIVE> -d <PATH_TO_SAVE>

The structure of the dataset is as follows:

├── hagrid_dataset <PATH_TO_DATASET_FOLDER>
│   ├── call
│   │   ├── 00000000.jpg
│   │   ├── 00000001.jpg
│   │   ├── ...
├── hagrid_annotations
│   ├── train <PATH_TO_JSON_TRAIN>
│   │   ├── call.json
│   │   ├── ...
│   ├── val <PATH_TO_JSON_VAL>
│   │   ├── call.json
│   │   ├── ...
│   ├── test <PATH_TO_JSON_TEST>
│   │   ├── call.json
│   │   ├── ...

Models

We provide some models pre-trained on HaGRIDv2 as the baseline with the classic backbone architectures for gesture classification, gesture detection and hand detection.

Gesture Detectors mAP
YOLOv10x 89.4
YOLOv10n 88.2
SSDLiteMobileNetV3Large 72.7

In addition, if you need to detect hands, you can use YOLO detection models, pre-trained on HaGRIDv2

Hand Detectors mAP
YOLOv10x 88.8
YOLOv10n 87.9

However, if you need a single gesture, you can use pre-trained full frame classifiers instead of detectors. To use full frame models, remove the no_gesture class

Full Frame Classifiers F1 Gestures
MobileNetV3_small 86.7
MobileNetV3_large 93.4
VitB16 91.7
ResNet18 98.3
ResNet152 98.6
ConvNeXt base 96.4

Train

You can use downloaded trained models, otherwise select a parameters for training in `configs` folder. To train the model, execute the following command: Single GPU: ```bash python run.py -c train -p configs/ ``` Multi GPU: ```bash bash ddp_run.sh -g 0,1,2,3 -c train -p configs/ ``` which -g is a list of GPU ids. Every step, the current loss, learning rate and others values get logged to **Tensorboard**. See all saved metrics and parameters by opening a command line (this will open a webpage at `localhost:6006`): ```bash tensorboard --logdir= ```

Test

Test your model by running the following command: Single GPU: ```bash python run.py -c test -p configs/ ``` Multi GPU: ```bash bash ddp_run.sh -g 0,1,2,3 -c test -p configs/ ``` which -g is a list of GPU ids.

Demo

python demo.py -p <PATH_TO_CONFIG> --landmarks

demo

Demo Full Frame Classifiers

python demo_ff.py -p <PATH_TO_CONFIG>

Annotations

The annotations consist of bounding boxes of hands and gestures in COCO format [top left X position, top left Y position, width, height] with gesture labels. We provide user_id field that will allow you to split the train / val / test dataset yourself, as well as a meta-informations contains automatically annotated age, gender and race.

"04c49801-1101-4b4e-82d0-d4607cd01df0": {
    "bboxes": [
        [0.0694444444, 0.3104166667, 0.2666666667, 0.2640625],
        [0.5993055556, 0.2875, 0.2569444444, 0.2760416667]
    ],
    "labels": [
        "thumb_index2",
        "thumb_index2"
    ],
    "united_bbox": [
        [0.0694444444, 0.2875, 0.7868055556, 0.2869791667]
    ],
    "united_label": [
        "thumb_index2"
    ],
    "user_id": "2fe6a9156ff8ca27fbce8ada318c592b",
    "hand_landmarks": [
            [
                [0.37233507701702123, 0.5935673528948108],
                [0.3997604810145188, 0.5925499847441514],
                ...
            ],
            [
                [0.37388438145820907, 0.47547576284667353],
                [0.39460467775730607, 0.4698847093520443],
                ...
            ]
        ]
    "meta": {
        "age": [24.41],
        "gender": ["female"],
        "race": ["White"]
    }

Bounding boxes

Object Train Val Test Total
gesture 980 924 120 003 200 006 1 300 933
no gesture 154 403 19 411 29 386 203 200
total boxes 1 135 327 139 414 229 392 1 504 133

Landmarks

Object Train Val Test Total
Total hands with landmarks 983 991 123 230 201 131 1 308 352

Converters

Yolo We provide a script to convert annotations to [YOLO](https://pjreddie.com/darknet/yolo/) format. To convert annotations, run the following command: ```bash python -m converters.hagrid_to_yolo --cfg --mode <'hands' or 'gestures'> ``` after conversion, you need change original definition [img2labels](https://github.com/WongKinYiu/yolov7/blob/2fdc7f14395f6532ad05fb3e6970150a6a83d290/utils/datasets.py#L347-L350) to: ```python def img2label_paths(img_paths): img_paths = list(img_paths) # Define label paths as a function of image paths if "train" in img_paths[0]: return [x.replace("train", "train_labels").replace(".jpg", ".txt") for x in img_paths] elif "test" in img_paths[0]: return [x.replace("test", "test_labels").replace(".jpg", ".txt") for x in img_paths] elif "val" in img_paths[0]: return [x.replace("val", "val_labels").replace(".jpg", ".txt") for x in img_paths] ```
Coco Also, we provide a script to convert annotations to [Coco](https://cocodataset.org/#home) format. To convert annotations, run the following command: ```bash python -m converters.hagrid_to_coco --cfg --mode <'hands' or 'gestures'> ```

License

Creative Commons License
This work is licensed under a variant of Creative Commons Attribution-ShareAlike 4.0 International License.

Please see the specific license.

Authors and Credits

Links

Citation

You can cite the paper using the following BibTeX entry:

@InProceedings{Kapitanov_2024_WACV,
    author    = {Kapitanov, Alexander and Kvanchiani, Karina and Nagaev, Alexander and Kraynov, Roman and Makhliarchuk, Andrei},
    title     = {HaGRID -- HAnd Gesture Recognition Image Dataset},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {4572-4581}
}