ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

Recurrent states not reset between episode boundaries #206

Open bamos opened 4 years ago

bamos commented 4 years ago

The policy is given the last recurrent state from the replay buffer and isn't reset between episode boundaries. In my case I have the number of updates set to the episode length, so I've added rollouts.recurrent_hidden_states.zero_() here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/main.py#L111

But in the more general case I think done should be looked at to re-initialize the right states.

bamos commented 4 years ago

(Sorry accidentally pressed a hotkey and sent that in early, edited)

erikwijmans commented 4 years ago

This is what this line here does: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/model.py#L113

bamos commented 4 years ago

Ah thanks -- that's pretty hidden and also doesn't hold in my case where I replaced that with my own recurrent policy class. Maybe there should be a well-placed comment or doc somewhere saying that recurrent policies should manually reset their hidden states?