Closed namiyao closed 8 years ago
Values of kl, and kl_firstfixed are the same, but derivatives are not. As far as I remember, this trick had to do with computation of natural gradient.
For instance, d/dx (x^2) is not equal to d/dx (x * stop_gradient(x)) The first one is equal to 2x, and second to x.
Hi Wojciech,
I think the kl_firstfixed is exactly the same as kl since the feed is
Why not just use kl instead of kl_firstfixed for simplicity as well as saving computation?