rail-berkeley / rlkit

Collection of reinforcement learning algorithms
MIT License
2.45k stars 550 forks source link

alpha convergence issues with discrete actions #75

Closed wcarvalho closed 5 years ago

wcarvalho commented 5 years ago

Hello, great repo. It's well designed and easy to use.

I'm using a discrete action space and seem to be having convergence issues. I read #50 and saw that x<= log(# actions) was a good target entropy. I've seen the following two behaviors: (a) when x = log(# actions), alpha seems to diverge. I've seen it in the hundreds before I stopped training. I'm not sure if this is expected behavior, but this causes the policy loss to blow up. (b) when x < log(# actions), alpha converges to 0. This leads the model to behave deterministically, I believe. I'm also not sure if this is expected behavior.

I was wondering if you've come across these problems.

Cheers

vitchyr commented 5 years ago

(a) Since this implements entropy-constrained SAC, the only solution is for the policy to be uniform at random. Since there's another term (the returns) that is pushing the policy to have non-uniform action, then alpha needs to keep increasing so that the policy only pays attention to the entropy. (b) Alpha isn't the entropy, but rather the weight on the entropy term. If alpha=0, then this means that the policy is "random enough" that alpha can be zero without it mattering. You might want to plot "Log Pi Mean" to see if that's roughly equal to the (negative) target entropy.

Feel free to re-open this issue if needed. Thanks for the kind words!