Original implementation of the paper, Kiru Park, Timothy Patten and Markus Vincze, "Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation", ICCV 2019, https://arxiv.org/abs/1908.07433
Codes that have been used to produce results for the BOP challenge 2020 are updated. Thanks to PBR training images provided by the challenge, the results of LM-O, HB, and ITODD are significantly improved.
The modifications from the original implementation of the paper are follows,
1) Replaced the encoder part with the first three blocks of Resnet-50 with pre-trained weights using ImageNet.
2) Increased a threshold for inlier pixels during PnP-Ransac operation (3 -> 5).
3) Detection results from Mask-RCNN are reused if predictions for each detection are not successful. In this case, Pix2Pose is performed for other objects that do not have good results yet.
6) A minor bug that causes bad detection results for the T-Less dataset is fixed. (different image resolutions were used during training and inference)
7) Increased the number of RPN proposals and NMS thresholds in Mask-RCNN (1000/0.7 to 2000/0.9), which produces more detection proposals
(w/ICP)
1) Parameters for the ICP refinement are optimized.
2) Adjusted inlier and outlier thresholds for Pix2Pose (inlier: 0.15 -> 0.2, outlier: [0.15,0.25,0.35] -> [0.2,0.3,0.35]).
3) A score of each hypothesis is computed by a new form, max(0,0.2-[depth_difference per pixel])/0.2, instead of counting the number of pixels that have less than 0.2 depth differences.
The official results are:
BOP Score'20 | AVG | LM-O | T-Less | TUD-L | IC-BIN | ITODD | HB | YCB-V |
---|---|---|---|---|---|---|---|---|
Pix2Pose(RGB + Depth ICP) | 0.591 | 0.588 | 0.512 | 0.820 | 0.390 | 0.351 | 0.695 | 0.780 |
Pix2Pose(RGB only) | 0.342 | 0.363 | 0.344 | 0.420 | 0.226 | 0.134 | 0.446 | 0.457 |
Vidal-Sensors18 (the best in '18,'19) | 0.569 | 0.582 | 0.538 | 0.876 | 0.393 | 0.435 | 0.706 | 0.450 |
CosyPose-ICP (ECCV'20, the best in '20) | 0.698 | 0.714 | 0.701 | 0.939 | 0.647 | 0.313 | 0.712 | 0.861 |
PBR Training images are used for LM-O, IC-BIN, ITODD, HB without additional images, and real training images are used for T-Less, TUD-L, YCB-V. To reproduce the same results, cfg/cfg_bop_2020.json or cfg/cfg_bop_2020_rgb.json (for RGB only results) has to be used with our up-to-date code.
Keras implementation of Mask-RCNN: used for LineMOD in the paper and all datasets in the BOP Challenge,
git clone https://github.com/matterport/Mask_RCNN.git
Keras implementation of Retinanet: used for evaluation of the T-Less dataset in the paper
git clone https://github.com/fizyr/keras-retinanet.git
If you use this code, please cite the following
@InProceedings{Park_2019_ICCV,
author = {Park, Kiru and Patten, Timothy and Vincze, Markus},
title = {Pix2Pose: Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2019}
}
The original codes are updated to support the format of the most recent 6D pose benchmark, BOP: Benchmark for 6D Object Pose Estimation
python3 tools/5_evaluation_bop_basic.py <gpu_id> <cfg_path> <dataset_name>
to run with the 3D-ICP refinement,
python3 tools/5_evaluation_bop_icp3d.py <gpu_id> <path_cfg_json> <dataset_name>
Important Note Differ from the paper, we used multiple outlier thresholds in the second stage for the BOP challenge since it is not allowed to have different parameters for each object or each dataset. This can be done easily by set the "outlier_th" in a 1D-array (refer to cfg_bop2019.json). In this setup, the best result, which has the largest inlier points, will be derived during estimation after applying all values in the second stage. To reproduce the results in the paper with fixed outlier threshold values, a 2D-array should be given as in "cfg_tless_paper.json")
docker build -t <container_name> .
nvidia-docker run -it -v <dasetdir>:/bop -v <detection_repo>:<detection_dir> -v <other_dir>:<other_dir> <container_name> bash
pip3 install ros_numpy
export PYTHONPATH=/usr/local/lib/python3.5/dist-packages:$PYTHONPATH(including other ROS related pathes)
We assume the dataset is organized in the BOP 2019 format. For a new dataset (not in the BOP), modify bop_io.py properly to provide proper directories for training. Theses training codes are used to prepare and train the network for the BOP 2019.
python3 tools/2_1_ply_file_to_3d_coord_model <cfg_path> <dataset_name>
The file converts 3D models and save them to the target folder with a dimension information in a file, "norm_factor.json".
python3 tools/2_2_render_pix2pose_training.py <cfg_path> <dataset_name>
python3 tools/3_train_pix2pose.py <cfg_path> <dataset_name> <obj_name> [background_img_folder]
python3 tools/4_convert_weights_inference.py <pix2pose_weights folder>
This program looks for the last weight file in each directory
python3 tools/1_1_scene_gen_for_detection.py <cfg_path> <dataset_name> <mask=1(true)/0(false)>
Output files
To train Mask-RCNN, the pre-trained weight for the MS-COCO dataset should be place in <path/to/Mask-RCNN>/mask_rcnn_coco.h5.
python3 tools/1_2_train_maskrcnn.py <cfg_path> <dataset_name>
or Train Keras-retinanet using the script in the repository. It is highly recommended to initialize the network using the weights trained for the MS-COCO dataset. link
keras_retinanet/bin/train.py csv <path_to_dataset>/gt.csv <path_to_dataset>/label.csv --freeze-backbone --weights resnet50_coco_best_v2.1.0.h5
After training, the weights should be converted into inference model by,
keras_retinanet/bin/convert_model.py /path/to/training/model.h5 /path/to/save/inference/model.h5
Please refer to the paper for other details regarding the training
These trained weights here are used to submit the results of core datasets in the BOP Challenge 2020.
First of all, norm_factors have to be downloded and placed in the following path:
[path/to/bop_dataset]/[dataset_name]/models_xyz/norm_factor.json
Download link: Norm_factor files for all 7 dataset
Download the zip files and extract them to the bop dataset folder e.g., for TLess, the extracted files should be placed in
[path to bop dataset]/tless/weight_detection/tless20190927T0827/mask_rcnn_tless_0005.h5
[path to bop dataset]/tless/pix2pose_weights/[obj_no]