ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

Insert obs, action in storge (PPO) #235

Closed mynsng closed 4 years ago

mynsng commented 4 years ago

In storage.py, inserting obs at (self.step + 1) index, action at (self.step) index. Then when we make batch, we get data (s_{t-1}, a_t, r_t ...). Obs and action are different time step data. Can I get some Intuition of this?