Closed qiuruiyu closed 12 months ago
您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you! 仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you
您好,感谢您的来信。已确认收到您的邮件,我会尽快处理您的邮件,谢谢。Hello!This is an automatic reply confirming that your email was received. Your email will be handled as soon as possible. Thank you! 仇睿瑜 Joseph QIUThis is an automatic reply, confirming that your e-mail was received.Thank you
Problem Description
I use my customized env for training with PPO_continuous_action.py and I save the state_dict of Agent every save_freq number_update. However, when I load the model and state_dict afterhead, I found the performance (reward) is far worse than train, even like random action. could you please provide an example for evaluation. I'm not sure whether it's the reason of env.wrapper