Dual V-Learning (DVL)

[Paper]

Official code base for Dual RL: Unification and New Methods for Reinforcement and Imitation Learning by Harshit Sikchi, Qinqing Zheng, Amy Zhang, and Scott Niekum.

This repository contains code for Dual V-Learning (DVL) framework for Reinforcement Learning proposed in our paper.

Please refer to instructions inside the offline folder to get started with installation and running the code.

Benefits of DVL over other offline RL methods

✅ Fixes the instability of Extreme Q Learning (XQL) \ ✅ Directly models V* in continuous action spaces \ ✅ Implict, no OOD Sampling or actor-critic formulation \ ✅ Conservative with respect to the induced behavior policy distribution \ ✅ Improves performance on the D4RL benchmark versus similar approaches

Citation

@misc{sikchi2023dual,
      title={Dual RL: Unification and New Methods for Reinforcement and Imitation Learning}, 
      author={Harshit Sikchi and Qinqing Zheng and Amy Zhang and Scott Niekum},
      year={2023},
      eprint={2302.08560},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Questions

Please feel free to email us if you have any questions.

Harshit Sikchi (hsikchi@utexas.edu)

Acknowledgement

This repository builds heavily on the XQL(https://github.com/Div99/xql) and IQL(https://github.com/ikostrikov/implicit_q_learning) codebases. Please make sure to cite them as well when using this code.

hari-sikchi / DVL