joschu / modular_rl

Implementation of TRPO and related algorithms
MIT License
617 stars 155 forks source link

Why Using timesteps in the evaluation of the value function? #25

Open afansi opened 7 years ago

afansi commented 7 years ago

Hello John, After reading your paper on TRPO and view your code on GitHub, I am a little bit confused on steps regarding the prediction of value functions. Here, you concatenate to the observation the time-step. Why are you doing this? is it mandatory? Hoping to receive feedback from you. Regards.