pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic
MIT License
810 stars 180 forks source link

Action scaling is missing on SAC_V branch #27

Closed alirezakazemipour closed 4 years ago

alirezakazemipour commented 4 years ago

Hey. action = y_t * self.action_scale + self.action_bias this part is missing in sample function on SAC_V branch.

pranz24 commented 4 years ago

You will get the correct results even without using action scaling. We anyway clip the actions in the end. @ku2482 made the argument that gradients and performance will be much smoother with action scaling which I intuitively agree with and that's why it was used in the main branch.

You are free to make PR for this (in SAC_V branch) if you want. :sweat_smile:

alirezakazemipour commented 4 years ago

Yeah you have clipped it in the end when the select_action method is called but the problem arises whenever you call method sample of the GaussainPolicy like: pi, log_pi, mean, log_std = self.policy.sample(state_batch) using that pi seems not correct because it's not been scaled. :smiley: I have implemented the SAC based on your repo which was well readable and to be honest, no PR would be required :smile: except that scaling part that is missing on SAC_V branch and the effect it has had on enforcing action bounds on the master branch too: :point_down: log_prob -= torch.log(self.action_scale * (1 - y_t.pow(2)) + epsilon) That multiplication of self.action_scale sounds not correct.

pranz24 commented 4 years ago

In the end gym envs, by default, will clip the action to action range of the envs even if I don't use clipping. The problem with pi will only arise if you Initialize Policy weights in the wrong manner. xavieruniform and orthogonal_ will work in this case. Wrong initialization with or without action scaling will cause problems.

alirezakazemipour commented 4 years ago

Thank you very much.