nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

policy.eval() after load_state_dict() #46

Closed xinqin23 closed 2 years ago

xinqin23 commented 3 years ago

Dear Barhate,

Hi!!! Thank you for your sharing of the code very much! I can reproduce your results and they look super cool!

During my playing around,

may I consult in file PPO.py. line 83, after
self.policy_old.load_state_dict(self.policy.state_dict()) Do we need a
self.policy_old.eval() Will this line of code introduce some difference...That maybe you observed in the past?

I am consulting as in this PyTorch web https://pytorch.org/tutorials/beginner/saving_loading_models.html

It indicated we shall call an eval() ...

"Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results."

But I didn't have a chance to fully test the difference yet, therefore come to consult maybe you already encountered something similar?

Thank you for reading! Best Regards, Xin

nikhilbarhate99 commented 2 years ago

"Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results."

Since we are not using any dropout or batch normalization layers, it does not matter if we call model.eval() or not.