Open lrfreeman opened 3 years ago
Hi, you are right. This implementation is actor-critic or Reinforce with a baseline rather than vanilla Reinforce and you can find that other implementations over github do this too. This is because we usually resort to a parametrized value function for a more stable performance. Reinforce is covered in chapter 13.3 in the RL book, Reinforce with a baseline in chapter 13.4 and actor-critic in chapter 13.5.
Hi there,
Thanks for sharing your repo, it's helping me greatly explore the field. I have a question I'm not sure the answer of. In this implementation I believe you have implemented the V(s) function and therefore have a parameterised value function. I have read that a vanilla implementation of the reinforce algorithm does not parameterise any value function, and instead has only a single parameterisation network that maps states to actions. Am I wrong, or is this more of an implementation of an actor-critic algorithm considering the dual networks?
Cheers,
Laurence