pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
https://pytorch.org/examples
BSD 3-Clause "New" or "Revised" License
22.42k stars 9.54k forks source link

A3C instead of actor-critic in reinforcement_learning/reinforce.py #151

Open susht3 opened 7 years ago

susht3 commented 7 years ago

There is the code of reinforce.py for action, r in zip(self.saved_actions, rewards): action.reinforce(r)

And there is the code of actor-critic.py: for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))

So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic

jeasinema commented 7 years ago

Yes, I'm partly agree with you, but with a small correction, the algorithm implemented should be an offline version A2C(Advantage Actor Critic).