ymxlzgy / SG-Bot

[ICRA 2024] SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs
8 stars 0 forks source link
embodied-ai generative-ai generative-model robotic-arm robotic-grasping robotic-manipulation robotics robotics-simulation scene-generation scene-graph


This is a minimal implementation of the paper SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs (ICRA 2024), arxiv.

Caption 1


conda env create -f environment.yml
cd extension
python setup.py install

Please also install Pytorch. We test it with Pytorch 1.12.1 with CUDA 11.6.


Please refer to this page for downloading the data used in the paper and more information.

Module Instruction

Shape autoencoders

We set up two shape autoencoders called AtlasNet and AtlastNet2. AtlasNet is trained with full shapes under canonical coordinates, while AtlasNet2 is trained under the camera frame, which provides shape priors to the goal scene graph to guide the imagination. We also provide trained models downloaded here: trained AtlasNet and trained AtlasNet2.

Scene Generator

We built the scene generator based on Graph-to-3D, a GCN-VAE architecture. Different from the original Graph-to-3D, we leverage a shape-aware scene graph to make the generated shapes aligned with the observed shapes in the initial scene. We provide the trained model available here: trained graph_to_3d.

If you want to retrain the network, --batchSize, --nepoch, --exp needs to be set with proper numbers.

cd graphto3d
python scripts/train_vaegan.py

More details can be found in the original repository.


There are two modes--robot and oracle. The robot mode support a robot arm manipulating the objects according to the imagination. This mode needs a grasping pose prediction network, which we use Contact-GraspNet. This needs tensorflow downloaded.

pip install tensorflow-estimator==2.7.0 tensorflow-gpu==2.7.0

The checkpoints can be downloaded from the original repository or here. After download the checkpoints, move them to ./contact_graspnet.

The oracle mode does not need an agent, but just directly put objects in relative poses. To make the script work, one can modify the variable mode inside, and then run:

python sgbot_pybullet.py

The results in the paper are under the oracle mode. We directly use the pre-defined scene graph as the goal.

Real-world Trial [TODO]

We provide a recorded rosbag to demonstrate the performance. To conduct this trial, MaskRCNN checkpoint needs to be downloaded from [here](). Additional requirements need to installed.