ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

Why do we reverse rewards? #72

Open npitsillos opened 4 years ago

npitsillos commented 4 years ago

I apologise is this is not the correct place but I didn't find anything elsewhere. Why are rewards reversed and why do we append R to the values at the end.