zju3dv / OnePose_Plus_Plus

Code for "OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models" NeurIPS 2022
Apache License 2.0
379 stars 46 forks source link

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

Project Page | Paper


OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models
Xingyi He*, Jiaming Sun*, Yu'ang Wang, Di Huang, Hujun Bao, Xiaowei Zhou
NeurIPS 2022

demo_vid

TODO List

Installation

conda env create -f environment.yaml
conda activate oneposeplus

LoFTR and DeepLM are used in this project. Thanks for their great work, and we appreciate their contribution to the community. Please follow their installation instructions and LICENSE:

git submodule update --init --recursive

# Install DeepLM
cd submodules/DeepLM
sh example.sh
cp ${REPO_ROOT}/backup/deeplm_init_backup.py ${REPO_ROOT}/submodules/DeepLM/__init__.py

Note that the efficient optimizer DeepLM is used in our SfM refinement phase. If you face difficulty in installation, do not worry. You can still run the code by using our first-order optimizer, which is a little slower.

COLMAP is also used in this project for Structure-from-Motion. Please refer to the official instructions for the installation.

Download the pretrained models, including our 2D-3D matching and LoFTR models. Then move them to ${REPO_ROOT}/weights.

[Optional] You may optionally try out our web-based 3D visualization tool Wis3D for convenient and interactive visualizations of feature matches and point clouds. We also provide many other cool visualization features in Wis3D, welcome to try it out.

# Working in progress, should be ready very soon, only available on test-pypi now.
pip install -i https://test.pypi.org/simple/ wis3d

Demo

After the installation, you can refer to this page to run the demo with your custom data.

Training and Evaluation

Dataset setup

  1. Download OnePose dataset from here and OnePose_LowTexture dataset from here, and extract them into $/your/path/to/onepose_datasets. If you want to evaluate on LINEMOD dataset, download the real training data, test data and 3D object models from CDPN, and detection results by YOLOv5 from here. Then extract them into $/your/path/to/onepose_datasets/LINEMOD The directory should be organized in the following structure:

    |--- /your/path/to/datasets
    |       |--- train_data
    |       |--- val_data
    |       |--- test_data
    |       |--- lowtexture_test_data
    |       |--- LINEMOD
    |       |      |--- real_train
    |       |      |--- real_test
    |       |      |--- models
    |       |      |--- yolo_detection

    You can refer to dataset document for more informations about OnePose_LowTexture dataset.

  2. Build the dataset symlinks

    REPO_ROOT=/path/to/OnePose_Plus_Plus
    ln -s /your/path/to/datasets $REPO_ROOT/data/datasets

    Reconstruction

    Reconstructed the semi-dense object point cloud and 2D-3D correspondences are needed for both training and test objects:

    python run.py +preprocess=sfm_train_data.yaml use_local_ray=True  # for train data
    python run.py +preprocess=sfm_inference_onepose_val.yaml use_local_ray=True # for val data
    python run.py +preprocess=sfm_inference_onepose.yaml use_local_ray=True # for test data
    python run.py +preprocess=sfm_inference_lowtexture.yaml use_local_ray=True # for lowtexture test data

    Inference

    
    # Eval OnePose dataset:
    python inference.py +experiment=inference_onepose.yaml use_local_ray=True verbose=True

Eval OnePose_LowTexture dataset:

python inference.py +experiment=inference_onepose_lowtexture.yaml use_local_ray=True verbose=True

Note that we perform the parallel evaluation on a single GPU with two workers by default. If your GPU memory is smaller than 6GB, you are supposed to add `use_local_ray=False` to turn off the parallelization.

### Evaluation on LINEMOD Dataset
```shell
# Parse LINDMOD Dataset to OnePose Dataset format:
sh scripts/parse_linemod_objs.sh

# Reconstruct SfM model on real training data:
python run.py +preprocess=sfm_inference_LINEMOD.yaml use_local_ray=True

# Eval LINEMOD dataset:
python inference.py +experiment=inference_LINEMOD.yaml use_local_ray=True verbose=True

Training

  1. Prepare ground-truth annotations. Merge annotations of training/val data:

    python merge.py +preprocess=merge_annotation_train.yaml
    python merge.py +preprocess=merge_annotation_val.yaml
  2. Begin training

    python train_onepose_plus.py +experiment=train.yaml exp_name=onepose_plus_train

    Note that the default config for training uses 8 GPUs with around 23GB VRAM for each GPU. You can set the GPU number or ID in trainer.gpus and reduce the batch size in datamodule.batch_size to reduce the GPU VRAM footprint.

All model weights will be saved under ${REPO_ROOT}/models/checkpoints/${exp_name} and logs will be saved under ${REPO_ROOT}/logs/${exp_name}. You can visualize the training process by Tensorboard:

tensorboard --logdir logs --bind_all --port your_port_number

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{
    he2022oneposeplusplus,
    title={OnePose++: Keypoint-Free One-Shot Object Pose Estimation without {CAD} Models},
    author={Xingyi He and Jiaming Sun and Yuang Wang and Di Huang and Hujun Bao and Xiaowei Zhou},
    booktitle={Advances in Neural Information Processing Systems},
    year={2022}
}

Acknowledgement

Part of our code is borrowed from hloc and LoFTR. Thanks to their authors for their great works.