sail-sg / rosmo

Codes for "Efficient Offline Policy Optimization with a Learned Model", ICLR2023
https://arxiv.org/abs/2210.05980
Apache License 2.0
28 stars 0 forks source link
arcade-learning-environment atari bsuite dm-haiku jax model-based-reinforcement-learning model-based-rl muzero muzero-unplugged offline-reinforcement-learning offline-rl reinforcement-learning rl-unplugged

ROSMO


Check status

License

Arxiv

Table of Contents

Introduction

This repository contains the implementation of ROSMO, a Regularized One-Step Model-based algorithm for Offline-RL, introduced in our paper "Efficient Offline Policy Optimization with a Learned Model". We provide the training codes for both Atari and BSuite experiments, and have made the reproduced results on Atari MsPacman publicly available at W&B.

Installation

Please follow the installation guide.

Usage

BSuite

To run the BSuite experiments, please ensure you have downloaded the datasets and placed them at the directory defined by CONFIG.data_dir in experiment/bsuite/config.py.

  1. Debug run.
    python experiment/bsuite/main.py -exp_id test -env cartpole
  2. Enable W&B logger and start training.
    python experiment/bsuite/main.py -exp_id test -env cartpole -nodebug -use_wb -user ${WB_USER}

Atari

The following commands are examples to train 1) a ROSMO agent, 2) its sampling variant, and 3) a MZU agent on the game MsPacman.

  1. Train ROSMO with exact policy target.
    python experiment/atari/main.py -exp_id rosmo -env MsPacman -nodebug -use_wb -user ${WB_USER}
  2. Train ROSMO with sampled policy target (N=4).
    python experiment/atari/main.py -exp_id rosmo-sample-4 -sampling -env MsPacman -nodebug -use_wb -user ${WB_USER}
  3. Train MuZero unplugged for benchmark (N=20).
    python experiment/atari/main.py -exp_id mzu-sample-20 -algo mzu -num_simulations 20 -env MsPacman -nodebug -use_wb -user ${WB_USER}

Citation

If you find this work useful for your research, please consider citing

@inproceedings{
  liu2023rosmo,
  title={Efficient Offline Policy Optimization with a Learned Model},
  author={Zichen Liu and Siyi Li and Wee Sun Lee and Shuicheng Yan and Zhongwen Xu},
  booktitle={International Conference on Learning Representations},
  year={2023},
  url={https://arxiv.org/abs/2210.05980}
}

License

ROSMO is distributed under the terms of the Apache2 license.

Acknowledgement

We thank the following projects which provide great references:

Disclaimer

This is not an official Sea Limited or Garena Online Private Limited product.