Is this a reinforce implementation or actually AC2 given the second network?

Hi there,

Thanks for sharing your repo, it's helping me greatly explore the field. I have a question I'm not sure the answer of. In this implementation I believe you have implemented the V(s) function and therefore have a parameterised value function. I have read that a vanilla implementation of the reinforce algorithm does not parameterise any value function, and instead has only a single parameterisation network that maps states to actions. Am I wrong, or is this more of an implementation of an actor-critic algorithm considering the dual networks?

Cheers,

Laurence

hagerrady13 / Reinforce-PyTorch

Is this a reinforce implementation or actually AC2 given the second network? #1