ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

Why recurrent_hidden_states_batch is used in feed_forward_generator #195

Closed yangysc closed 5 years ago

yangysc commented 5 years ago

Could u kindly suggest some default parameters for ppo of recurrent policy? The readme file seems to only provide parameter for non-recurrent policy.

Thanks in advance.