jsalfity / task_decomposition

Task Decomposition project
3 stars 0 forks source link

Task Decomposition

This repository is the code base for the IROS 2024 submission: Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks by Jonathan Salfity, Selma Wanna, Minkyu Choi, and Mitch Pryor. The corresponding author is Jonathan Salfity (j [dot] salfity [at] utexas [dot] edu).

The code base is divided into the following sections:

Supporting functions including API call, prompt building, in-context learning examples, and random baseline implementation are found in /utils.

Prerequisites

Install this package

pip install -e .

To run the robosuite simulations

Download the mujoco binaries from here. Place in ~/.mujoco/mujoco<>/ folder. Install mujoco via pip

pip install mujoco

Install robosuite:

pip install robosuite

Running robosuite simulation and generating data

Running robosuite based simulations with a predefined state machine

Configure which environment to run in the /scripts/demo_config.yaml. Currently we have 4 environments: "Stack", "Lift", "Door", "PickPlace". Follow the uncommented lines in /scripts/demo_config.yaml to set the correct fields. Run the data generation script

python scripts/run_demo.py

Running downloaded demo_v141.hdf5 files and generating data

(This data is not used in the IROS paper) Go to the robomimic site to download data: https://robomimic.github.io/docs/datasets/robomimic_v0.1.html. (Note that this currently only seems to work with Safari broswer). Place the downloaded hdf5 files in the respective data/robomimic folder. Run the data generation script /scripts/record_robomimic_data.py with command line args that specificy the path to the demo_v141.hdf5 file, the number of demos to run, and whether to save_txt or save_video. The script will automatically extract the specific env_name and place the text and videos in the respective data/txt or data/video folders.

Example:

python scripts/record_robomimic_data.py --dataset path/to/robomimic/demo_v141.hdf5 --num_demos 1 --save_txt 1 --save_video 1

Querying an FM

Assuming you have set up OpenAI and generativeai python packages and set the API keys as environment variables, i.e. OPENAI_API_KEY and GOOGLE_API_KEY.

The configuration file for the LLM is in config/query_LLM_config.yaml. The following are options for the FM model:

All states in each environment are in R^3 and represent the x-y-z position of the object in the environment. All actions are in R^7, using the Robosuites OSC_POSE controller. The following are options for the environment:

The following are options for input modalities and in-context learning examples to include in the LLM prompt query, which can be used in combination with each other:

To run the LLM, run the following command:

python analysis/query_LLM.py

Comparison between FM output and groundtruth data

See analysis/main_metrics_calculations.ipynb to generate plots show in the paper.