largelymfs / svpg_REINFORCE

Stein Variational Policy Gradient for REINFORCE
MIT License
17 stars 5 forks source link

It seems that there is a +/- sign typo #2

Open heyzude opened 1 year ago

heyzude commented 1 year ago

In vpg_multi_stein.py, at line 212, I guess it should be weights = (1.0 / self.temp) * kernel[:, :, None] * gradient[:, None, :] - kernel_gradient[:, :, :], not weights = (1.0 / self.temp) * kernel[:, :, None] * gradient[:, None, :] + kernel_gradient[:, :, :].

heyzude commented 1 year ago

Sorry. Your implementation is right. My statement above is wrong.