SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders (ECCV 2024)

comparison

Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang†, Jane Yung-jen Hsu† (†corresponding authors)

What does "SA-DVAE" stand for?

SA-DVAE stands for Semantic Alignment via Disentangled Variational Autoencoders.

TL;DR

SA-DVAE improves zero-shot skeleton-based action recognition by aligning modality-specific VAEs and disentangling skeleton features into semantic and non-semantic parts, achieving better performance on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.

system architecture

Setting Up the Environment

The codebase has been tested with the following setup:

Operating System: Ubuntu 22.04
Python Version: 3.10
GPU: 1x NVIDIA RTX 3090 with CUDA version 12.7

Clone the Repository

git clone https://github.com/pha123661/SA-DVAE.git
cd SA-DVAE

Install Dependencies
```
pip install -r requirements.txt
```
Download Pre-extracted Features
- Download the pre-extracted features for the NTU-60, NTU-120, and PKU-MMD datasets here.
- Extract the resources.zip file.
- Place all subdirectories under ./resources.
Optional: Generate Features Yourself
- Download the class descriptions at ./class_lists and skeleton features at NTU RGB+D and PKUMMD.
- Use sentence_transformers or transformers packages to extract semantic features.
- Use mmaction2 to train and extract skeleton features.

Ensure that the directory structure is as follows:

SA-DVAE
├── resources
│   ├── label_splits
│   ├── sk_feats
│   └── text_feats
...

Training

We provide three training scripts in ./scripts, each corresponding to the three main experiments in our paper:

Comparison with SOTA Methods

./scripts/train_eval_synse_split.sh {dataset}

Random Class Splits
```
./scripts/train_eval_average_random_split.sh {dataset}
```
- This script runs experiments on three different seen/unseen class splits.
Enhanced Class Descriptions by a Large Language Model (LLM)
```
./scripts/train_eval_llm_descriptions.sh {dataset}
```
- This script runs experiments on three different seen/unseen class splits.

where dataset should be one of ntu60, ntu120, and pku51.

Training Steps

Each training script follows these four stages, covering both Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) training and evaluation:

Train and evaluate SA-DVAE for ZSL.
Prepare $\mathbf{p}_s$ and $\mathbf{p}_u$ for Domain Classifier Training.
Train the Domain Classifier.
Evaluate SA-DVAE under GZSL.

Acknowledgements

Our codebase is mainly built upon skelemoa/synse-zsl. We thank the authors for their excellent work.

Citation

@inproceedings{li2024sadvae,
  title={SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders},
  author={Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024}
}

pha123661 / SA-DVAE

readme