Closed toshikwa closed 5 years ago
I pushed another commit.
Hmm... I thought the performance was ok.
You only posted this result after making the earlier commit, right?Performance on Half-Cheetah
Were there any issues you faced with the performance recently?
Also, now you are rescaling the action before calculating loglikelihood and before that, you were rescaling actions after calculating the loglikelihood. ( Am I correct? )
According to my earlier tests, SAC will even work if you clip the action rather than rescaling them. (Although I was clipping actions after calculating the loglikelihood. Also, mujoco-py clips the actions for you by default)
Even I don't have enough computational resource available at the moment. But I'll figure something out and let you know results by next week. ( if it's fine with you )
Anyway, thank you! for taking a deep look into this problem.
You only posted this result after making the earlier commit, right?
Yes.
Also, now you are rescaling the action before calculating loglikelihood and before that, you were rescaling actions after calculating the loglikelihood. ( Am I correct? )
Yes. The difference is the scale of entropy. I'm not sure which is better, but newer commit considers correct entropy. (But they can be equivalent if you modify alpha.)
I think clipping makes gradients inconsistent and rescaling is better. (Maybe not so important. I will test it)
Anyway, really thank you for providing helpful feedbacks.
Hi.
There is another interpretation that re-scaling actions after calculating the log likelihood is considering re-scaling as the environment (or the system). In this case, the agent's action is the one before re-scaling and we can get correct entropy if we input the action before re-scaling to Q functions.
I think this might be better because this makes (the part of) inputs of Q functions normalized (-1, 1). (Neural networks tend to prefer normalized inputs.) But this may be not so important and is kind of heuristic.
However I haven't tested these assumptions...
There is another interpretation that re-scaling actions after calculating the log likelihood is considering re-scaling as the environment (or the system). In this case, the agent's action is the one before re-scaling and we can get correct entropy if we input the action before re-scaling to Q functions
Awesome!, that is exactly what I was thinking.
Will test and let you know.
Sorry for my misunderstanding.
I think the reason of degrading performance with action re-scaling is wrong calculations of log likelihood. Because re-scaling changes the likelihood, it need to be fixed correctly.
I think now it's correctly fixed, although I haven't checked (I have no computational resource available...) Can you test it?
Anyway, I really appreciate your great work.