pha123661 / SA-DVAE

[ECCV 2024] The official repo for "SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders"
MIT License
18 stars 2 forks source link
action-recognition computer-vision disentanglement eccv2024 generalized-zero-shot-learning pytorch skeleton variational-autoencoder video-understanding vision-language zero-shot-learning

SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders (ECCV 2024)

comparison

Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang†, Jane Yung-jen Hsu† (†corresponding authors)

arXiv

What does "SA-DVAE" stand for?

SA-DVAE stands for Semantic Alignment via Disentangled Variational Autoencoders.

TL;DR

SA-DVAE improves zero-shot skeleton-based action recognition by aligning modality-specific VAEs and disentangling skeleton features into semantic and non-semantic parts, achieving better performance on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.

system architecture

Setting Up the Environment

The codebase has been tested with the following setup:

  1. Clone the Repository

    git clone https://github.com/pha123661/SA-DVAE.git
    cd SA-DVAE
  2. Install Dependencies

    pip install -r requirements.txt
  3. Download Pre-extracted Features

    • Download the pre-extracted features for the NTU-60, NTU-120, and PKU-MMD datasets here.
    • Extract the resources.zip file.
    • Place all subdirectories under ./resources.

    Optional: Generate Features Yourself

  4. Ensure that the directory structure is as follows:

    SA-DVAE
    ├── resources
    │   ├── label_splits
    │   ├── sk_feats
    │   └── text_feats
    ...

Training

We provide three training scripts in ./scripts, each corresponding to the three main experiments in our paper:

  1. Comparison with SOTA Methods

    ./scripts/train_eval_synse_split.sh {dataset}
  2. Random Class Splits

    ./scripts/train_eval_average_random_split.sh {dataset}
    • This script runs experiments on three different seen/unseen class splits.
  3. Enhanced Class Descriptions by a Large Language Model (LLM)

    ./scripts/train_eval_llm_descriptions.sh {dataset}
    • This script runs experiments on three different seen/unseen class splits.

where dataset should be one of ntu60, ntu120, and pku51.

Training Steps

Each training script follows these four stages, covering both Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) training and evaluation:

  1. Train and evaluate SA-DVAE for ZSL.

  2. Prepare $\mathbf{p}_s$ and $\mathbf{p}_u$ for Domain Classifier Training.

  3. Train the Domain Classifier.

  4. Evaluate SA-DVAE under GZSL.

Acknowledgements

Our codebase is mainly built upon skelemoa/synse-zsl. We thank the authors for their excellent work.

Citation

@inproceedings{li2024sadvae,
  title={SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders},
  author={Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024}
}