Open slerman12 opened 2 years ago
Doesn't PPO, at least the vanilla variant, only work on-policy? That is, from recent data, not an experience replay?
@slerman12 i'm far from an expert on reinforcement learning, but that's in line with my thoughts!
Doesn't PPO, at least the vanilla variant, only work on-policy? That is, from recent data, not an experience replay?