6D-ViT

teaser

Overview

This is the offical repository of our recent work 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning, we provide the pose estimation results on the REAL275 testset to evaluate the performance of our method.

More information will be released soon.

Dependencies

Python 3.6
PyTorch 1.7.1+cu110
CUDA 11.2
OpenCV-python 4.4.0

Evaluation on the REAL275 testset

Download the Mask R-CNN results, pose predictions by NOCS, NOF, SPD and our 6D-ViT from here
The pretrained model on the NOCS-REAL dataset is here

unzip -q real_test.zip
ROOT=/path/to/6D-ViT
mkdir $ROOT/results
mv real_test/* $ROOT/results
rmdir real_test
cd $ROOT
python evaluate_mean_real.py

The evaluation results will be generated under the folder _$ROOT/results/6D-ViT_results/realtest/

teaser

Dataset	Category	3D₅₀	3D₇₅	5°2cm	5°5cm	10°2cm	10°5cm	10°10cm
REAL275	Bottle	0.5766	0.5005	0.5799	0.6318	0.7969	0.8703	0.9752
	Bowl	0.9999	0.9992	0.7874	0.8186	0.9548	0.9914	0.9914
	Camera	0.8709	0.1917	0.0000	0.0000	0.0014	0.0019	0.0019
	Can	0.7146	0.6996	0.5350	0.5624	0.8573	0.9551	0.9555
	Laptop	0.8334	0.6170	0.3383	0.4461	0.6163	0.9217	0.9361
	Mug	0.9878	0.8577	0.0490	0.0524	0.3166	0.3333	0.3333
	Average	0.8306	0.6443	0.3816	0.4186	0.5906	0.6789	0.6989

Citation

If you find this work helpful, please consider citing

@article{zou20226d,
  title={6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning},
  author={Zou, Lu and Huang, Zhangjin and Gu, Naijie and Wang, Guoping},
  journal={IEEE Transactions on Image Processing},
  volume={31},
  pages={6907--6921},
  year={2022},
  publisher={IEEE}
}

Acknowledgement

Our work is built upon object-deformnet, we thank the authors for releasing their code.