qq456cvb / PACE

MIT License
11 stars 0 forks source link
dataset eccv2024 pose-estimation

PACE: Pose Annotations in Cluttered Environments
(ECCV 2024)

Yang You, Kai Xiong, Zhening Yang, Zhengxiang Huang, Junwei Zhou, Ruoxi Shi, Zhou Fang, Adam W Harley, Leonidas Guibas, Cewu Lu

Paper PDF Project Page

We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark designed to advance the development and evaluation of pose estimation methods in cluttered scenarios. PACE provides a large-scale real-world benchmark for both instance-level and category-level settings. The benchmark consists of 55K frames with 258K annotations across 300 videos, covering 238 objects from 43 categories and featuring a mix of rigid and articulated items in cluttered scenes. To annotate the real-world data efficiently, we develop an innovative annotation system with a calibrated 3-camera setup. Additionally, we offer PACESim, which contains 100K photo-realistic simulated frames with 2.4M annotations across 931 objects. We test state-of-the-art algorithms in PACE along two tracks: pose estimation, and object pose tracking, revealing the benchmark’s challenges and research opportunities.

Why a new dataset?

Update Logs

Contents

Dataset Download

Our dataset can be downloaded on HuggingFace. Please unzip all the tar.gz and place them under dataset/pace for evaluation. Large files are split into chunks, you can merge them with e.g., cat test_chunk_* > test.tar.gz.

Dataset Format

Our dataset mainly follows BOP format with the following structure with regex syntax:

camera_pbr.json
models(_eval|_nocs)?
├─ models_info.json
├─ (artic_info.json)?
├─ obj_${OBJ_ID}.ply
model_splits
├─ category
|  ├─ ${category}_(train|val|test).txt
|  ├─ (train|val|test).txt
├─ instance
|  ├─ (train|val|test).txt
(train(_pbr_cat|_pbr_inst)|val(_inst|_pbr_cat)|test)
├─ ${SCENE_ID}
│  ├─ scene_camera.json
│  ├─ scene_gt.json
│  ├─ scene_gt_info.json
│  ├─ scene_gt_coco_det_modal(_partcat|_inst)?.json
│  ├─ depth
│  ├─ mask
│  ├─ mask_visib
│  ├─ rgb
|  ├─ (rgb_nocs)?

Dataset Visualization

We provide a visualization script to visualize the ground-truth pose annotations together wich their rendered 3D models. You can run visualizer.ipynb and get the following rgb/rendering/pose/mask visualizations:

Benchmark Evaluation

Unzip all the tar.gz from HuggingFace and place them under dataset/pace in order for evaluation.

Instance-Level Pose Estimation

To evaluate instance-level pose estimation, please make sure you cloned the submodule of our fork of bop_toolkit. You can do this after git clone with git submodule update --init, or alternatively do git clone --recurse-submodules git@github.com:qq456cvb/PACE.git.

Put the prediction results under prediction/instance/${METHOD_NAME}_pace-test.csv (you can download baseline resuts them from Google Drive). Then run the following commands:

cd eval/instance
sh eval.sh ${METHOD_NAME}

Category-Level Pose Estimation

Put the prediction results under prediction/category/${METHOD_NAME}_pred.pkl (you can download baseline resuts them from Google Drive). We also convert the ground-truth labels into a compatible pkl format, where you can download from here and put it under eval/category/catpose_gts_test.pkl. Then run the following commands:

cd eval/category
sh eval.sh ${METHOD_NAME}

Note: You may find more categories (55) in category_names.txt than that reported in the paper. This is because, some categories don't have corresponding real-world test images but only a set of 3D models, so we drop them. The actual categories (47) for evaluation is stored in category_names_test.txt (parts are counted as separate categories). Ground-truth class ids in catpose_gts_test.pkl still use the index from 1 to 55 corresponding to category_names.txt.

Annotation Tools

We also provide the source code of our annotation tools, organized as follows:

annotation_tool
├─ inpainting
├─ obj_align
├─ obj_sym
├─ pose_annotate
├─ postprocessing
├─ TFT_vs_Fund
├─ utils

More detailed documentation for the annotation software is coming soon. We are cleaning the code and try our best to make it convenient for the community to annotate 3D object poses accurately.

Licenses

MIT license for all contents, except:

Our models with ID from 693 to 1260 are grabbed from SketchFab with CC BY license, with credit given to the model creators. You can find the original posts of these models on https://sketchfab.com/3d-models/${OBJ_IDENTIFIER}, where the identifier can be found in the second component (separated by /) of key identifier in models_info.json.

Models with ID 1165 and 1166 are grabed from GrabCAD (these two share the same geometry but different colors). For these two models, please see the license from GrabCAD.

Citation

@misc{you2023pace,
    title={PACE: Pose Annotations in Cluttered Environments},
    author={You, Yang and Xiong, Kai and Yang, Zhening and Huang, Zhengxiang and Zhou, Junwei and Shi, Ruoxi and Fang, Zhou and Harley, Adam W. and Guibas, Leonidas and Lu, Cewu},
    booktitle={European Conference on Computer Vision},
    year={2024},
    organization={Springer}
}