Should there be a self.policy_old.load_state_dict(self.policy.state_dict()) on line 85 of PPO.py, after the initialization of PPO object? PyTorch random initialization does not guarantee that these two policies will be the same. And the same issue for PPO_continuous.py.
Should there be a
self.policy_old.load_state_dict(self.policy.state_dict())
on line 85 ofPPO.py
, after the initialization of PPO object? PyTorch random initialization does not guarantee that these two policies will be the same. And the same issue forPPO_continuous.py
.