pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic
MIT License
810 stars 180 forks source link

Fix bugs of action re-scaling #15

Closed toshikwa closed 5 years ago

toshikwa commented 5 years ago

Sorry for my misunderstanding.

I think the reason of degrading performance with action re-scaling is wrong calculations of log likelihood. Because re-scaling changes the likelihood, it need to be fixed correctly.

I think now it's correctly fixed, although I haven't checked (I have no computational resource available...) Can you test it?

Anyway, I really appreciate your great work.

toshikwa commented 5 years ago

I pushed another commit.

pranz24 commented 5 years ago

Hmm... I thought the performance was ok.

You only posted this result after making the earlier commit, right?Performance on Half-Cheetah

Were there any issues you faced with the performance recently?

Also, now you are rescaling the action before calculating loglikelihood and before that, you were rescaling actions after calculating the loglikelihood. ( Am I correct? )

According to my earlier tests, SAC will even work if you clip the action rather than rescaling them. (Although I was clipping actions after calculating the loglikelihood. Also, mujoco-py clips the actions for you by default)

Even I don't have enough computational resource available at the moment. But I'll figure something out and let you know results by next week. ( if it's fine with you )

Anyway, thank you! for taking a deep look into this problem.

toshikwa commented 5 years ago

You only posted this result after making the earlier commit, right?

Yes.

Also, now you are rescaling the action before calculating loglikelihood and before that, you were rescaling actions after calculating the loglikelihood. ( Am I correct? )

Yes. The difference is the scale of entropy. I'm not sure which is better, but newer commit considers correct entropy. (But they can be equivalent if you modify alpha.)

I think clipping makes gradients inconsistent and rescaling is better. (Maybe not so important. I will test it)

Anyway, really thank you for providing helpful feedbacks.

toshikwa commented 5 years ago

Hi.

There is another interpretation that re-scaling actions after calculating the log likelihood is considering re-scaling as the environment (or the system). In this case, the agent's action is the one before re-scaling and we can get correct entropy if we input the action before re-scaling to Q functions.

I think this might be better because this makes (the part of) inputs of Q functions normalized (-1, 1). (Neural networks tend to prefer normalized inputs.) But this may be not so important and is kind of heuristic.

However I haven't tested these assumptions...

pranz24 commented 5 years ago

There is another interpretation that re-scaling actions after calculating the log likelihood is considering re-scaling as the environment (or the system). In this case, the agent's action is the one before re-scaling and we can get correct entropy if we input the action before re-scaling to Q functions

Awesome!, that is exactly what I was thinking.

Will test and let you know.