openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.68k stars 4.86k forks source link

eprewmean is much lower after loading a saved model to continue training. #1198

Closed riflemanl closed 2 years ago

riflemanl commented 2 years ago

Hi, when I trained a ppo2/mujoco model for a long period of time in tf2 branch(more than 24 hrs), the eprewmean reach around 20. And then I tried to save all checkpoints(it looks like tf2 code cannot save whoel model!?), anyway, after loading those saved checkpoints with the same seed number and continuing training, the eprewmean drop to around 6~7 and takes a long time to go back to 20 again, it's much worse than original saved model !? Can I saved a more complete model? Or anything I can do to improve the saved model? Thanks!!