PPO1 ob_rms effects the oldpi

openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

MIT License

15.64k stars 4.86k forks source link

PPO1 ob_rms effects the oldpi #918

Open jonathanbrady88 opened 5 years ago

jonathanbrady88 commented 5 years ago

Hello, thank you for this wonderful repository. I have a question about the PPO1 code for line: https://github.com/openai/baselines/blob/9b68103b737ac46bc201dfb3121cfa5df2127e53/baselines/ppo1/pposgd_simple.py#L173

Before assigning the pi to the oldpi the observation filter is updated and because of that the oldpi doesn't output the same mean vector outputted for sampling. Shouldn't the mean vector output the same values outputted during sampling so that importance sampling works?

Berk035 commented 4 years ago

Hello,

I also wonder that the effect of this line. Thank you. Up to date.