Wrong temperature loss implementation for discrete SAC

p-christ / Deep-Reinforcement-Learning-Algorithms-with-PyTorch

PyTorch implementations of deep reinforcement learning algorithms and environments

MIT License

5.66k stars 1.2k forks source link

Wrong temperature loss implementation for discrete SAC #61

Closed qiyan98 closed 4 years ago

qiyan98 commented 4 years ago

In the discrete-SAC paper, the temperature loss in Eq. (11) indicates that the direct expectation should be calculated rather than the Monte-carlo estimate, the same logic as Eq. (10). The implementation however, still calls the calculate_entropy_tuning_loss in SAC.py using .mean().

p-christ commented 4 years ago

thanks, are you able to make a pull request fixing the issue and i can incorporate it?

qiyan98 commented 4 years ago

Sorry, didn't notice that the expected log_pi is already calculated in the discrete SAC. So, the code should be just good.