olixu / blog-comment

0 stars 0 forks source link

PolicyGradient公式推导 | Oliver xu's Blog #53

Open olixu opened 4 years ago

olixu commented 4 years ago

https://blog.oliverxu.cn/2020/08/01/PolicyGradient%E5%85%AC%E5%BC%8F%E6%8E%A8%E5%AF%BC/

本文是对Sutton的《Reinforcement learning An introduction》书中第13章Policy Gradient Methods部分的总结,主要包括Policy Gradient方法的离散时间情形下的公式推导,REINFORCE算法,REINFORCE with Baseline算法,Short Corridor with switched actions环境下的仿