alpha convergence issues with discrete actions

rail-berkeley / rlkit

Collection of reinforcement learning algorithms

MIT License

2.45k stars 550 forks source link

Hello, great repo. It's well designed and easy to use.

I'm using a discrete action space and seem to be having convergence issues. I read #50 and saw that x<= log(# actions) was a good target entropy. I've seen the following two behaviors: (a) when x = log(# actions), alpha seems to diverge. I've seen it in the hundreds before I stopped training. I'm not sure if this is expected behavior, but this causes the policy loss to blow up. (b) when x < log(# actions), alpha converges to 0. This leads the model to behave deterministically, I believe. I'm also not sure if this is expected behavior.

I was wondering if you've come across these problems.

Cheers

rail-berkeley / rlkit

alpha convergence issues with discrete actions #75