ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k stars 832 forks source link

Stale hidden states #278

Open aklein1995 opened 3 years ago

aklein1995 commented 3 years ago

Hi!

I was taking a look at your code and wondering if you tackle the stale hidden states after each rollout. As I have seen, the code is used in order to be stateful at episode level, and then, when done is found, the hidden states are reset. However, from one rollout to another, the output hidden state of the last rollout is copied in order to be the input hidden state of the current rollout, although the actor-critic network parameters (including GRU) have already been updated.

Is there any reason why you do not recalculate the last rollouts hidden state taking into account the new network weights? Thank you in advance!