Off-policy PPO? - Githubissues

lucidrains / anymal-belief-state-encoder-decoder-pytorch

Implementation of the Belief State Encoder / Decoder in the new breakthrough robotics paper from ETH Zürich

MIT License

64 stars 9 forks source link

Open slerman12 opened 2 years ago

slerman12 commented 2 years ago

Doesn't PPO, at least the vanilla variant, only work on-policy? That is, from recent data, not an experience replay?

lucidrains commented 2 years ago

@slerman12 i'm far from an expert on reinforcement learning, but that's in line with my thoughts!