Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang†, Jane Yung-jen Hsu† (†corresponding authors)
SA-DVAE stands for Semantic Alignment via Disentangled Variational Autoencoders.
SA-DVAE improves zero-shot skeleton-based action recognition by aligning modality-specific VAEs and disentangling skeleton features into semantic and non-semantic parts, achieving better performance on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.
The codebase has been tested with the following setup:
Clone the Repository
git clone https://github.com/pha123661/SA-DVAE.git
cd SA-DVAE
Install Dependencies
pip install -r requirements.txt
Download Pre-extracted Features
resources.zip
file../resources
.Optional: Generate Features Yourself
./class_lists
and skeleton features at NTU RGB+D and PKUMMD.Ensure that the directory structure is as follows:
SA-DVAE
├── resources
│ ├── label_splits
│ ├── sk_feats
│ └── text_feats
...
We provide three training scripts in ./scripts
, each corresponding to the three main experiments in our paper:
Comparison with SOTA Methods
./scripts/train_eval_synse_split.sh {dataset}
Random Class Splits
./scripts/train_eval_average_random_split.sh {dataset}
Enhanced Class Descriptions by a Large Language Model (LLM)
./scripts/train_eval_llm_descriptions.sh {dataset}
where dataset
should be one of ntu60
, ntu120
, and pku51
.
Each training script follows these four stages, covering both Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) training and evaluation:
Train and evaluate SA-DVAE for ZSL.
Prepare $\mathbf{p}_s$ and $\mathbf{p}_u$ for Domain Classifier Training.
Train the Domain Classifier.
Evaluate SA-DVAE under GZSL.
Our codebase is mainly built upon skelemoa/synse-zsl. We thank the authors for their excellent work.
@inproceedings{li2024sadvae,
title={SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders},
author={Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024}
}