oppo-us-research / USST

Apache License 2.0
16 stars 5 forks source link

Egocentric 3D Hand Trajectory Forecasting

Project | ArXiv | Demo

Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong

This is an official PyTorch implementation of the USST published in ICCV 2023. We release the dataset annotations (H2O-PT and EgoPAT3D-DT), PyTorch codes (training, inference, and demo), and the pretrained model weights.

Table of Contents

  1. Task Overview
  2. Installation
  3. Datasets
  4. Demo & Testing
  5. Training
  6. Citation

Task Overview

Egocentric 3D Hand Trajectory Forecasting (Ego3D-HTF) aims to predict the future 3D hand trajectory (in red color) given the past observation of an egocentric RGB video and historical trajectory (in blue color). Compared to predicting the trajectory in 2D space, predicting trajectory in global 3D space is practically more valuable to understand human intention for AR/VR applications.

Brief Intro.

Brief Intro. [YouTube]

Installation

Datasets

EgoPAT3D-DT

H2O-PT

Training

a. Train the proposed ViT-based USST model on EgoPAT3D-DT dataset using GPU_ID=0 and 8 workers:

cd exp
nohup bash train.sh 0 8 usst_vit_3d >train_egopat3d.log 2>&1 &

b. Train the proposed ViT-based USST model on H2O-PT dataset using GPU_ID=0 and 8 workers:

cd exp
nohup bash trainval_h2o.sh 0 8 h2o/usst_vit_3d train >train_h2o.log 2>&1 &

c. This repo contains TensorboardX suport to monitor the training status:

# open a new terminial
cd output/EgoPAT3D/usst_vit_3d
tensorboard --logdir=./logs
# open the browser with the prompted localhost url.

d. Checkout other model variants in the config/ folder, including the ResNet-18 backbones (usst_res18_xxx.yml), 3D/2D trajectory target (usst_xxx_3d/2d.yml), and 3D target in local camera reference (usst_xxx_local3d).

Testing

a. Test and evaluate a trained model, e.g., usst_vit_3d, on EgoPAT3D-DT testing set:

cd exp
bash test.sh 0 8 usst_vit_3d

b. Test and evaluate a trained model, e.g., usst_res18_3d, on H2O-PT testing set:

cd exp
bash trainval_h2o.sh 0 8 usst_res18_3d eval

Evaluation results will be cached in output/[EgoPAT3D|H2O]/usst_vit_3d and reported on the terminal.

c. To evaluate the 2D trajectory forecasting performance of a pretrained 3D target model, modify the config file usst_xxx_3d.yml to set TEST.eval_space: norm2d, then run the test.sh (or trainval_h2o.sh) again.

d. If only doing testing without training, please download our pretrained model from here: OneDrive. After downloaded a zip file, place it under the output/ folder, e.g., output/EgoPAT3D/usst_res18_3d.zip and then extract it: cd output/EgoPAT3D && unzip usst_res18_3d.zip. Then, run the run the test.sh (or trainval_h2o.sh).

e. [Optional] Show the demos of a testing examples in our paper:

python demo_paper.py --config config/usst_vit_3d.yml --tag usst_vit_3d

Citation

If you find the code useful in your research, please cite:

@inproceedings{BaoUSST_ICCV23,
  author = "Wentao Bao and Lele Chen and Libing Zeng and Zhong Li and Yi Xu and Junsong Yuan and Yu Kong",
  title = "Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting",
  booktitle = "International Conference on Computer Vision (ICCV)",
  year = "2023"
}

Please also cite the EgoPAT3D paper if you use our EgoPAT3D-DT annotations:

@InProceedings{Li_2022_CVPR,
  title = {Egocentric Prediction of Action Target in 3D},
  author = {Li, Yiming and Cao, Ziang and Liang, Andrew and Liang, Benjamin and Chen, Luoyao and Zhao, Hang and Feng, Chen},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2022}
}

and H2O paper if you use our H2O-PT annotations:

@InProceedings{Kwon_2021_ICCV,
  author = {Kwon, Taein and Tekin, Bugra and St\"uhmer, Jan and Bogo, Federica and Pollefeys, Marc},
  title = {H2O: Two Hands Manipulating Objects for First Person Interaction Recognition},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month = {October},
  year = {2021},
  pages = {10138-10148}
}

License

Acknowledgement

We sincerely thank the owners of the following source code repos, which are referred by our released codes: EgoPAT3D, hoi_forecast, pyk4a, RAFT, and NewCRFs.