philtabor / Youtube-Code-Repository

Repository for most of the code from my YouTube channel
859 stars 479 forks source link

ActorNetwork - sample_normal method log_probs issue #59

Open zenineasa opened 1 year ago

zenineasa commented 1 year ago

In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.

log_probs -= T.log(1-action.pow(2)+self.reparam_noise)

https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130