Open zenineasa opened 1 year ago
In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.
log_probs -= T.log(1-action.pow(2)+self.reparam_noise)
https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130
In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.
log_probs -= T.log(1-action.pow(2)+self.reparam_noise)
https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130