ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

question about the recurrent #289

Closed rainbow979 closed 2 years ago

rainbow979 commented 2 years ago

I have a question about the recurrent. Under the recurrent setting, there is not the computation graph betweent step t and step t-1, how to backprop gradient? https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/a2c_ppo_acktr/storage.py#L145

rainbow979 commented 2 years ago

I got the reply from reddit and did solve my question. https://www.reddit.com/r/reinforcementlearning/comments/t2oi2g/comment/hyngt09/?utm_source=share&utm_medium=web2x&context=3