This is the official source code for the IROS2023 oral work: Depth-based Object 6DoF Pose Estimation using Swin Transformers. (https://arxiv.org/abs/2303.02133).
SwinDePose is a general framework for representation learning from a depth image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from FFB6D.
Before the representation learning stage of SwinDePose, we build normal vector angles image generation module to generate normal vector angles images from depth images. Besides, depth images are lifted to point clouds by camera intrinsic parameters K. Then, the normal vector angles images and point clouds are fed into images and point clouds feature extraction networks to learn representations. Moreover, the learned embeddings from normal vector angles images and point clouds are fed into 3D keypoints localization module and instance segmentation module. Finally, a least-squares fitting manner is applied to estimate 6D poses.
If you find SwinDePose useful in your research, please consider citing:
@inproceedings{Li2023Depthbased6O,
title={Depth-based 6DoF Object Pose Estimation using Swin Transformer},
author={Zhujun Li and Ioannis Stamos},
year={2023}
}
conda env create -f swin_de_pose/environment.yml
conda activate lab-swin
pip install -r swin_de_pose/mmseg_install.txt
pip3 install "pybind11[global]"
git clone https://github.com/hfutcgncas/normalSpeed.git
cd normalSpeed
python3 setup.py install --user
cd models/RandLA
sh compile_op.sh
Due to lacking of apex installation, you may have to delete all apex related modules and functions.
sh scripts/test_single_lab.sh
docker pull zhujunli/swin-pose:latest
sudo nvidia-docker run --gpus all --ipc=host --shm-size 50G --ulimit memlock=-1 --name your_docker_environment_name -it --rm -v your_workspace_directory:/workspace zhujunli/swin-pose:latest
pip install -r swin_de_pose/mmseg_install.txt
cd models/RandLA
sh compile_op.sh
Linemod_preprocessed/
to ffb6d/datasets/linemod/Linemod_preprocessed
:
ln -s path_to_unzipped_Linemod_preprocessed ffb6d/dataset/linemod/
ln -s path_to_Linemod_preprocessed ./Linemod_preprocessed
Don't have to do it every time. python3 rgbd_renderer.py --cls phone --render_num 10000
python3 fuse.py --cls phone --fuse_num 10000
python -m create_angle_npy.py --cls_num your_cls_num --train_list 'train.txt' --test_list 'test.txt'
python -m create_angle_npy.py --cls_num your_cls_num --train_list 'train.txt' --test_list 'test.txt'
bash sh scripts/train_lm.sh
The trained checkpoints are stored in experiment_name/train_log/linemod/checkpoints/{cls}/
.bash sh scripts/test_lm.sh
You can evaluate different checkpoint by revising tst_mdl
to the path of your target model.ape_best.pth.tar
to train_log/linemod/checkpoints/ape/
. Then revise tst_mdl=train_log/linemod/checkpoints/ape/ape_best.path.tar
for testing.bash sh scripts/test_lm_vis.sh
Train the model for the target object.
bash sh scripts/train_occlm.sh
The trained checkpoints are stored in
experiment_name/train_log/occ_linemod/checkpoints/{cls}/
.
Start evaluation by:
bash sh scripts/test_occlm.sh
You can evaluate different checkpoint by revising tst_mdl
to the path of your target model.
bash sh scripts/test_occlm_vis.sh
Train the model for the target object.
bash sh scripts/train_ycb.sh
The trained checkpoints are stored in
experiment_name/train_log/ycb/checkpoints/ycb.pth.tar
.
Start evaluation by:
bash sh scripts/test_ycb.sh
You can evaluate different checkpoint by revising tst_mdl
to the path of your target model.
Pretrained model: We provide our pre-trained models on onedrive, link. Download them and move them to their according folders.
Evaluation on the LineMod Dataset
Qualitative Results on the LineMod Dataset
Evaluation on the Occlusion LineMod Dataset
Qualitative Results on the Occlusion LineMod Dataset
Evaluation on the YCBV Dataset
Following Fetch Robot to check the robot we integrated.
Following Robot Grasping Video to check the video that our fetch robot embedded our SwinDePose network grasps texture-less objects.
SwinDePose is released under the MIT License (refer to the LICENSE file for details).