Before assigning the pi to the oldpi the observation filter is updated and because of that the oldpi doesn't output the same mean vector outputted for sampling. Shouldn't the mean vector output the same values outputted during sampling so that importance sampling works?
Hello, thank you for this wonderful repository. I have a question about the PPO1 code for line: https://github.com/openai/baselines/blob/9b68103b737ac46bc201dfb3121cfa5df2127e53/baselines/ppo1/pposgd_simple.py#L173
Before assigning the pi to the oldpi the observation filter is updated and because of that the oldpi doesn't output the same mean vector outputted for sampling. Shouldn't the mean vector output the same values outputted during sampling so that importance sampling works?