nakamotoo / Cal-QL

official implementation for our paper Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
https://nakamotoo.github.io/Cal-QL
76 stars 5 forks source link
offline-rl offline-to-online-rl reinforcement-learning

Cal-QL

This is the implementation for our paper Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning in Jax and Flax.

This codebase is built upon JaxCQL repository.

If you find this repository useful for your research, please cite:

@article{nakamoto2023calql,
  author       = {Mitsuhiko Nakamoto and Yuexiang Zhai and Anikait Singh and Max Sobol Mark and Yi Ma and Chelsea Finn and Aviral Kumar and Sergey Levine},
  title        = {Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning},
  conference   = {arXiv Pre-print},
  year         = {2023},
  url          = {https://arxiv.org/abs/2303.05479},
}

Installation

  1. Install MuJoCo

  2. Add following environment variables into ~/.bashrc

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
  3. Install and use the included Ananconda environment

    $ conda create -c nvidia -n Cal-QL python=3.8 cuda-nvcc=11.3
    $ conda activate Cal-QL
    $ pip install -r requirements.txt
  4. Set up W&B API keys

This codebase visualizes the logs using Weights and Biases. To enable this, you first need to set up your W&B API key by:

Run Experiments

AntMaze

You can run experiments using the following command:

$ bash scripts/run_antmaze.sh

Please check scripts/run_antmaze.sh for the details. All available command options can be seen in conservative_sac_main.py and conservative_sac.py.

Adroit Binary

  1. Download the offline dataset from here and unzip the files into <this repositroy>/demonstrations/offpolicy_hand_data/*.npy
  2. We should also install mj_envs from this fork
    $ git clone --recursive https://github.com/nakamotoo/mj_envs.git
    $ cd mj_envs  
    $ git submodule update --remote
    $ pip install -e .
  3. Now you can run experiments using the following command:
    $ bash scripts/run_adroit.sh

    Please check scripts/run_adroit.sh for the details.

Other Environments

At the moment, this repository only has AntMaze and Adroit implemented. FrankaKitchen is planned to be added soon, but if you are in a hurry or would like to try other tasks (such as the visual manipulation domain in the paper), please contact me at nakamoto[at]berkeley[dot]edu.

Sample Runs and Logs

In order to enable other readers to replicate our results easily, we have conducted a sweep for Cal-QL and CQL in the AntMaze and Adroit domains and made the corresponding W&B logs publicly accessible. The logs can be found here: https://wandb.ai/mitsuhiko/Cal-QL--Examples?workspace=user-mitsuhiko

You can choose the environment to visualize by filering on env. Cal-QL runs are indicated by enable-calql=True, and CQL runs are denoted by enable-calql=False. Each env has been run across 4 seeds.

Credits

This project is built upon Young Geng's JaxCQL repository. The CQL implementation is based on CQL.

In case of any questions, bugs, suggestions or improvements, please feel free to contact me at nakamoto[at]berkeley[dot]edu