Unstable Baselines(USB) is designed to serve as a quick-start guide for Reinforcement Learning beginners and a codebase for agile algorithm development. The algorithms strictly follows the original implementations, and the performance of Unstable Baselines matches those in the original implementations. USB is currently maintained by researchers from lamda-rl.
Baseline RL | Continuous Action Space | Discrete Action Space | Image Input | Status |
---|---|---|---|---|
DQN | ✕ | ✔ | ✔ | Stable |
VPG | ✔ | ✕ | ✔ | Stable |
DDPG | ✔ | ✕ | ✔ | Stable |
TD3 | ✔ | ✕ | ✔ | Stable |
TRPO | ✔ | ✕ | ✔ | Stable |
PPO | ✔ | ✔ | ✔ | Stable |
SAC | ✔ | ✔ | ✔ | Stable |
REDQ | ✔ | ✔ | ✔ | Stable |
Option Critic | - | - | - | Developing |
Model Based RL | Continuous Action Space | Discrete Action Space | Image Input | Status |
---|---|---|---|---|
MBPO | ✔ | ✕ | ✔ | Updating |
Meta RL | Continuous Action Space | Discrete Action Space | Image Input | Status |
---|---|---|---|---|
PEARL | ✔ | ✕ | ✕ | Updating |
MAML | - | - | - | Developing |
*Updating: the algorithm is being developed to adapt to the latest USB version, and will be "Stable" soon
*Developing: the algorithm is being implemented, and will appear on the project soon
git clone --recurse-submodules https://github.com/x35f/unstable_baselines.git
cd unstable_baselines
conda env create -f env.yaml
conda activate usb
pip install -e .
In the directory of the algorithm
python3 /path/to/algorithm/main.py /path/to/algorithm/configs/some-config.py args(optional)
For example
cd unstable_baselines/baselines/sac
python3 main.py configs/Ant-v3.py --gpu 0
or for the ease of aggregating logs
python3 unstable_baselines/baselines/sac/main.py unstable_baselines/baselines/sac/configs/Ant-v3.py --gpu 0
#install metaworld for meta_rl benchmark
cd envs/metaworld
pip install -e .