hagerrady13 / Reinforce-PyTorch

5 stars 1 forks source link

Is this a reinforce implementation or actually AC2 given the second network? #1

Open lrfreeman opened 3 years ago

lrfreeman commented 3 years ago

Hi there,

Thanks for sharing your repo, it's helping me greatly explore the field. I have a question I'm not sure the answer of. In this implementation I believe you have implemented the V(s) function and therefore have a parameterised value function. I have read that a vanilla implementation of the reinforce algorithm does not parameterise any value function, and instead has only a single parameterisation network that maps states to actions. Am I wrong, or is this more of an implementation of an actor-critic algorithm considering the dual networks?

Cheers,

Laurence

hagerrady13 commented 3 years ago

Hi, you are right. This implementation is actor-critic or Reinforce with a baseline rather than vanilla Reinforce and you can find that other implementations over github do this too. This is because we usually resort to a parametrized value function for a more stable performance. Reinforce is covered in chapter 13.3 in the RL book, Reinforce with a baseline in chapter 13.4 and actor-critic in chapter 13.5.

http://incompleteideas.net/book/RLbook2020.pdf