vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.41k stars 616 forks source link

How to do evaluation for example on PPO #400

Closed qiuruiyu closed 12 months ago

qiuruiyu commented 1 year ago

Problem Description

I use my customized env for training with PPO_continuous_action.py and I save the state_dict of Agent every save_freq number_update. However, when I load the model and state_dict afterhead, I found the performance (reward) is far worse than train, even like random action. could you please provide an example for evaluation. I'm not sure whether it's the reason of env.wrapper

vwxyzjn commented 1 year ago

See https://github.com/vwxyzjn/cleanrl/issues/310#issuecomment-1317761050

qiuruiyu commented 1 year ago

您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you!                                                 仇睿瑜                                             Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you

qiuruiyu commented 12 months ago

您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you!                                                 仇睿瑜                                             Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you