Wenlong Huang1, Chen Wang1, Yunzhu Li2, Ruohan Zhang1, Li Fei-Fei1 (* indicates equal contributions)
1Stanford University, 3Columbia University
This is the official demo code for ReKep implemented in OmniGibson. ReKep is a method that uses large vision models and vision-language models in a hierarchical optimization framework to generate closed-loop trajectories for manipulation tasks.
Note that this codebase is best run with a display. For running in headless mode, refer to the instructions in OmniGibson.
NOTE: If you encounter the warning We did not find Isaac Sim under ~/.local/share/ov/pkg.
when running ./scripts/setup.sh
for OmniGibson, first ensure that you have installed Isaac Sim. Assuming Isaac Sim is installed in the default directory, then provide the following path /home/[USERNAME]/.local/share/ov/pkg/isaac-sim-2023.1.1
(replace [USERNAME]
with your username).
Install ReKep in the same conda environment:
conda activate omnigibson
cd ..
git clone https://github.com/huangwl18/ReKep.git
cd ReKep
pip install -r requirements.txt
Obtain an OpenAI API key and set it up as an environment variable:
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
We provide a demo "pen-in-holder" task that illustrates the core idea in ReKep. Below we provide several options to run the demo.
Notes:
--visualize
flag may be added to visualize every solution from optimization, but since the pipeline needs to repeatedly solves optimization problems, the visualization is blocking and needs to be closed every time in order to continue (by pressing "ESC").We recommend starting with the cached VLM query.
python main.py --use_cached_query [--visualize]
A video will be saved to ./videos
by default.
Since ReKep acts as a closed-loop policy, it is robust to disturbances with automatic failure recovery both within stages and across stages. To demonstrate this in simulation, we apply the following disturbances for the "pen-in-holder" task:
Move the pen when robot is trying to grasp the pen
Take the pen out of the gripper when robot is trying to reorient the pen
Move the holder when robot is trying to drop the pen into the holder
Note that since the disturbances are pre-defined, we recommend running with the cached query.
python main.py --use_cached_query --apply_disturbance [--visualize]
The following script can be run to query VLM for a new sequence of ReKep constraints and executes them on the robot:
python main.py [--visualize]
Leveraging the diverse objects and scenes provided by BEHAVIOR-1K in OmniGibson, new tasks and scenes can be easily configured. To change the objects, you may check out the available objects as part of the BEHAVIOR assets on this page (click on each object instance to view its visualization). After identifying the objects, we recommend making a copy of the JSON scene file ./configs/og_scene_file_pen.json
and edit the state
and objects_info
accordingly. Remember that the scene file need to be supplied to the Main
class at initialization. Additional scenes and robots provided by BEHAVIOR-1K may also be possible, but they are currently untested.
To deploy ReKep in the real world, most changes should only be needed inside environment.py
. Specifically, all of the "exposed functions" need to be changed for the real world environment. The following components need to be implemented:
execute_action
in environment.py
receives a target end-effector pose, we first calculate IK to obtain the target joint positions and send the command to the low-level controller.Since there are several components in the pipeline, running them sequentially in the real world may be too slow. As a result, we recommend running the following compute-intensive components in separate processes in addition to the main process that runs main.py
: subgoal_solver
, path_solver
, keypoint_tracker
, sdf_reconstruction
, mask_tracker
, and grasp_detector
(if used).
Prompt Tuning: Since ReKep relies on VLMs to generate code-based constraints to solve for the behaviors of the robot, it is sensitive to the specific VLM used and the prompts given to the VLM. Although visual prompting is used, typically we find that the prompts do not necessarily need to contain image-text examples or code examples, and pure-text high-level instructions can go a long way with the latest VLM such as GPT-4o
. As a result, when starting with a new domain and if you observe that the default prompt is failing, we recommend the following steps: 1) pick a few representative tasks in the domain for validation purposes, 2) procedurally update the prompt with high-level text examples and instructions, and 3) test the prompt by checking the text output and return to step 2 if needed.
Performance Tuning: For clarity purpose, the entire pipeline is run sequentially. The latency introduced by the simulator and the solvers gets compounded. If this is a concern, we recommend running compute-intensive components, such as the simulator, the subgoal_solver
, and the path_solver
, in separate processes, but concurrency needs to be handled with care. More discussion can be found in the "Real-World Deployment" section. To tune the solver, the objective
function typically takes the majority of time, and among the different costs, the reachability cost by the IK solver is typically most expensive to compute. Depending on the task, you may reduce sampling_maxfun
and maxiter
in configs/config.yaml
or disable the reachability cost.
Task-Space Planning: Since the current pipeline performs planning in the task space (i.e., solving for end-effector poses) instead of the joint space, it occasionally may produce actions kinematically challenging for robots to achieve, especially for tasks that require 6 DoF motions.
For issues related to OmniGibson, please raise a issue here. You are also welcome to join the Discord channel for timely support.
For other issues related to the code in this repo, feel free to raise an issue in this repo and we will try to address it when available.