LandmarkGait

Official code for "LandmarkGait: Intrinsic Human Parsing for Gait Recognition" (ACM MM 2023).

Motivation

Some challenges when apply human parsing to gait recognition:

lack of RGB modality
lack of annotated body parts
difficulty in balancing parsing quantity and quality

Introduction

We propose LandmarkGait, an unsupervised parsing-based solution that focuses on specific complete body part representations from original binary silhouettes for gait recognition, including ''Silhouette-to-Landmarks'', ''Landmarks-to-Parsing'', and ''Recognition''

LandmarkNet (Unsupervised Landmark Discovery, Silhouette-to-Landmarks) allows transforming dense silhouette into a finite set of landmarks with remarkable consistency across various conditions in an unsupervised manner.
ParsingNet (Human Parsing Network, Landmarks-to-Parsing) is a lightweight human parsing network to efficiently generate human parsing from well-aligned landmarks.
Multi-scale Feature Extraction Network (Recognition) can take advantage of multiple sources of rich information, including multi-scale inputs (holistic silhouettes, finely parsing parts, and mixed parsing parts within a sequence) and multi-scale convolutions, enabling fine-grained feature extraction.

Getting Started

clone this repo.

git clone git@github.com:wzb-bupt/LandmarkGait.git

Install dependenices:

pytorch >= 1.10
torchvision
pyyaml
tensorboard
opencv-python
tqdm
py7zr
kornia
einops
six

Install dependenices by Anaconda:

conda install tqdm pyyaml tensorboard opencv kornia einops six -c conda-forge
conda install pytorch==1.10 torchvision -c pytorch

Or, Install dependenices by pip:

pip install tqdm pyyaml tensorboard opencv-python kornia einops six
pip install torch==1.10 torchvision==0.11

Prepare dataset
- In most cases, we strictly follow the setting of OpenGait.
- Differently, since CASIA-B uses outdated background subtraction algorithm, we utilize a re-segment version from Ren et al as CASIA-BN.

Training Details

To achieve better convergence, we first train LandmarkNet and ParsingNet sequentially to obtain spatio-temporally consistent landmarks and parsing parts. Subsequently, we fix the encoder of LandmarkNet and use pre-trained weights to initiate these two networks and jointly train the subsequent recognition network end-to-end for final gait recognition.

Step1: LandmarkNet (Silhouette-to-Landmarks, config)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 opengait/main.py --cfgs ./configs/landmarkgait/LandmarkGait_Silh_to_Landmark.yaml --phase train --log_to_file

Step2: ParsingNet (Landmarks-to-Parsing, config)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 opengait/main.py --cfgs ./configs/landmarkgait/LandmarkGait_Landmark_to_Parsing.yaml --phase train --log_to_file

Step3: Multi-scale Feature Extraction Network (PMBC-Net, Recognition and Evaluation, config)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 opengait/main.py --cfgs ./configs/landmarkgait/LandmarkGait_Recognition.yaml --phase train --log_to_file

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 opengait/main.py --cfgs ./configs/landmarkgait/LandmarkGait_Recognition.yaml --phase test --log_to_file

Visual Results

Citing Our Paper

If you find this codebase useful in your research, please consider citing:

@inproceedings{wang2023landmarkgait,
    title={LandmarkGait: Intrinsic Human Parsing for Gait Recognition},
    author={Wang, Zengbin and Hou, Saihui and Zhang, Man and Liu, Xu and Cao, Chunshui and Huang, Yongzhen and Xu, Shibiao},
    booktitle={Proceedings of the 31st ACM International Conference on Multimedia (ACM MM)},
    pages={2305--2314},
    year={2023}
}

Acknowledgement

Our code is built upon the great open-source project OpenGait.

wzb-bupt / LandmarkGait

readme