zhujunli1993 / SwinDePose

MIT License
46 stars 3 forks source link

[IROS 2023]SwinDePose: Depth-based Object 6DoF Pose Estimation using Swin Transformers

This is the official source code for the IROS2023 oral work: Depth-based Object 6DoF Pose Estimation using Swin Transformers. (https://arxiv.org/abs/2303.02133).

Table of Contents

Update!!! Uploaded the information of the robot that we integrated with our model for testing in a real-world environment for object grasping.

Update!!!: Uploaded the pretrained models of LM and OCC-LM and YCBV datasets.

Update!!!: Uploaded conda installation environments file.

Introduction & Citation

SwinDePose is a general framework for representation learning from a depth image, and we applied it to the 6D pose estimation task by cascading downstream prediction headers for instance semantic segmentation and 3D keypoint voting prediction from FFB6D.

Before the representation learning stage of SwinDePose, we build normal vector angles image generation module to generate normal vector angles images from depth images. Besides, depth images are lifted to point clouds by camera intrinsic parameters K. Then, the normal vector angles images and point clouds are fed into images and point clouds feature extraction networks to learn representations. Moreover, the learned embeddings from normal vector angles images and point clouds are fed into 3D keypoints localization module and instance segmentation module. Finally, a least-squares fitting manner is applied to estimate 6D poses.

If you find SwinDePose useful in your research, please consider citing:

@inproceedings{Li2023Depthbased6O,
  title={Depth-based 6DoF Object Pose Estimation using Swin Transformer},
  author={Zhujun Li and Ioannis Stamos},
  year={2023}
}

Installation - From conda

[Click to expand] - **swin_de_pose** - **swin_de_pose/apps** - **swin_de_pose/apps/train_lm.py**: Training & Evaluating code of SwinDePose models for the LineMOD dataset. - **swin_de_pose/apps/train_occlm.py**: Training & Evaluating code of SwinDePose models for the Occ-LineMOD dataset. - **swin_de_pose/config** - **swin_de_pose/config/common.py**: Some network and datasets settings for experiments. - **swin_de_pose/config/options.py**: Training and evaluating parameters settings for experiments. - **swin_de_pose/scripts** - **swin_de_pose/scripts/train_lm.sh**: Bash scripts to start the traing on the LineMOD dataset. - **swin_de_pose/scripts/test_lm.sh**: Bash scripts to start the testing on the LineMOD dataset. - **swin_de_pose/scripts/train_occlm.sh**: Bash scripts to start the training on the Occ-LineMOD dataset. - **swin_de_pose/scripts/test_occlm.sh**: Bash scripts to start the testing on the Occ-LineMOD dataset. - **swin_de_pose/datasets** - **swin_de_pose/datasets/linemod/** - **swin_de_pose/datasets/linemod/linemod_dataset.py**: Data loader for LineMOD dataset. - **swin_de_pose/datasets/linemod/create_angle_npy.py**: Generate normal vector angles images for real scene Linemod datset. - **swin_de_pose/datasets/occ_linemod** - **swin_de_pose/datasets/occ_linemod/occ_dataset.py**: Data loader for Occ-LineMOD dataset. - **swin_de_pose/datasets/occ_linemod/create_angle_npy.py**:Generate normal vector angles images for Occ-Linemod datset. - **swin_de_pose/mmsegmentation**: packages of swin-transformer. - **swin_de_pose/models** - **swin_de_pose/models/SwinDePose.py**: Network architecture of the proposed SwinDePose. - **swin_de_pose/models/cnn** - **swin_de_pose/models/cnn/extractors.py**: Resnet backbones. - **swin_de_pose/models/cnn/pspnet.py**: PSPNet decoder. - **swin_de_pose/models/cnn/ResNet_pretrained_mdl**: Resnet pretraiend model weights. - **swin_de_pose/models/loss.py**: loss calculation for training of FFB6D model. - **swin_de_pose/models/pytorch_utils.py**: pytorch basic network modules. - **swin_de_pose/models/RandLA/**: pytorch version of RandLA-Net from [RandLA-Net-pytorch](https://github.com/qiqihaer/RandLA-Net-pytorch) - **swin_de_pose/utils** - **swin_de_pose/utils/basic_utils.py**: basic functions for data processing, visualization and so on. - **swin_de_pose/utils/meanshift_pytorch.py**: pytorch version of meanshift algorithm for 3D center point and keypoints voting. - **swin_de_pose/utils/pvn3d_eval_utils_kpls.py**: Object pose esitimation from predicted center/keypoints offset and evaluation metrics. - **swin_de_pose/utils/ip_basic**: Image Processing for Basic Depth Completion from [ip_basic](https://github.com/kujason/ip_basic). - **swin_de_pose/utils/dataset_tools** - **swin_de_pose/utils/dataset_tools/DSTOOL_README.md**: README for dataset tools. - **swin_de_pose/utils/dataset_tools/requirement.txt**: Python3 requirement for dataset tools. - **swin_de_pose/utils/dataset_tools/gen_obj_info.py**: Generate object info, including SIFT-FPS 3d keypoints, radius etc. - **swin_de_pose/utils/dataset_tools/rgbd_rnder_sift_kp3ds.py**: Render rgbd images from mesh and extract textured 3d keypoints (SIFT/ORB). - **swin_de_pose/utils/dataset_tools/utils.py**: Basic utils for mesh, pose, image and system processing. - **swin_de_pose/utils/dataset_tools/fps**: Furthest point sampling algorithm. - **swin_de_pose/utils/dataset_tools/example_mesh**: Example mesh models. - **swin_de_pose/train_log** - **swin_de_pose/train_log/** - **swin_de_pose/train_log/{your experiment name}/checkpoints/**: Storing trained checkpoints on your experiment. - **swin_de_pose/train_log/{your experiment name}/eval_results/**: Storing evaluated results on your experiment. - **swin_de_pose/train_log/{your experiment name}/train_info/**: Training log on your experiment. - **figs/**: Images shown in README.

Datasets

Training and evaluating

Training on the LineMOD Dataset

Evaluating on the LineMOD Dataset

Visualizaion on the LineMOD Dataset

Training on the Occ-LineMOD Dataset

Evaluating on the Occ-LineMOD Dataset

Visualizaion on the Occ-LineMOD Dataset

Training on the YCBV Dataset

Evaluating on the YCBV Dataset

Fetch Robot Information

Following Fetch Robot to check the robot we integrated.

Fetch Robot Grasping Video

Following Robot Grasping Video to check the video that our fetch robot embedded our SwinDePose network grasps texture-less objects.

License

SwinDePose is released under the MIT License (refer to the LICENSE file for details).