Variational Curriculum Reinforcement Learning

This repository contains the official training and evaluation code for Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills. It provides an implementation of VCRL variants on the Sawyer environment, including the presented method Value Uncertainty Variational Curriculum (VUVC).

Installation

All Python dependencies are listed in environment.yml. To set up the environment, follow these steps:

Install the Anaconda environment by running the following command:
```
conda env create -f environment.yml
```
Activate vcrl environment:
```
conda activate vcrl
```
Install the codebase by running:
```
pip install -e .
```

Usage

General Usage

You can use the following command to run the training and evaluation code:

python -m scripts.METHOD \
    BASE_LOGDIR \
    --gpu_id GPU_ID \
    --snapshot_gap SNAPSHOT_GAP \
    --seed SEED \
    --spec EXP_SPEC

The placeholders should be replaced with the appropriate values:

METHOD: Training method. Choose one of the VCRL variants: [her, rig, edl, skewfit, vuvc].
BASE_LOGDIR: Sub-directory where the training and evaluation results will be saved, including the policy, replay buffer, and training log.
GPU_ID: GPU ID to use.
SNAPSHOT_GAP: Save the model every SNAPSHOT_GAP training epochs. The best performing model will be saved as params.pkl.
SEED: Random seed. The seeds that we used in the paper range from 0 to 4.
EXP_SPEC: Experiment specification. The results will be saved at BASE_LOGDIR/vcrl_logs/ENV_ID/METHOD/EXP_SPEC/SEED/.

By default, hyperparameters used in the paper are defined in the script files for each training method. To test different configurations, you can override them with your own choices.

Training with EDL is in two stages: 1) training a VAE along with a density-based exploration policy and 2) unsupervised training of skills. To specify the training stage, use the --mode flag with the options train_vae or train_policy in the command line for edl.

Examples

Here are some examples of running the code on the SawyerDoorHook environment:

# VUVC
python -m scripts.SawyerDoorHook.vuvc /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# HER
python -m scripts.SawyerDoorHook.her /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# RIG
python -m scripts.SawyerDoorHook.rig /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# Skew-Fit
python -m scripts.SawyerDoorHook.skewfit /tmp/vcrl/ --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

# EDL
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_vae --gpu_id 0 --snapshot_gap 20 --seed 0
python -m scripts.SawyerDoorHook.edl /tmp/vcrl/ --mode train_policy --gpu_id 0 --snapshot_gap 20 --seed 0 --spec default

Reference

@inproceedings{kim2023variational,
  title={Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills},
  author={Kim, Seongun and Lee, Kyowoon and Choi, Jaesik},
  booktitle={International Conference on Machine Learning},
  year={2023},
}

License

This repository is released under the MIT license. See LICENSE for additional details.

Credits

This repository is extended from rlkit. For more details about the coding infrastructure, please refer to rlkit. \ The Sawyer environment is adapted from multiworld where the Sawyer MuJoCo models are developed by Vikash Kumar under Apache-2.0 License.

seongun-kim / vcrl

readme