toshikwa / soft-actor-critic.pytorch

PyTorch implementation of Soft Actor-Critic(SAC).
MIT License
98 stars 22 forks source link

Soft Actor-Critic in PyTorch

A PyTorch implementation of Soft Actor-Critic[1,2] with n-step rewards and prioritized experience replay[3].

NOTE

I re-implemented Soft Actor-Critic in discor.pytorch repositry, which is better organized and faster, with DisCor algorithm. Please check it out!!

Requirements

You can install liblaries using pip install -r requirements.txt except mujoco_py.

Note that you need a licence to install mujoco_py. For installation, please follow instructions here.

Examples

You can train Soft Actor-Critic agent like this example here.

python code/main.py \
[--env_id str(default HalfCheetah-v2)] \
[--cuda (optional)] \
[--seed int(default 0)]

If you want to use n-step rewards and prioritized experience replay, set multi_step=5 and per=True in configs.

Results

Results of above example (without n-step rewards nor prioritized experience replay) will be like below, which are comparable (or better) with results of the paper.

References

[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).

[2] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).

[3] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).