Open xmax1 opened 4 years ago
K-FAC shouldn't use centered activation statistics.
However if you wanted to do something like this in some general context, the correct thing would probably be to take a decayed average of aa^T and subtract from that \hat{a}\hat{a}^T where \hat{a} is the decayed average of a (computed using the same coefficients as was used for the decayed average of aa^T).
What is the best way to center the moving averages?
If analytically our the activation kronecker factor is given by (a - \bar{a})^T(a - \bar{a}) where a are the instantaneous activations but we use the in practice we use the moving averages of the covariances A how does the centering affect this?
Does it suffice to center the instantaneous activations (as here (a - \bar{a})^T(a - \bar{a})) or, for example, is there a running average of the center also (A - \bar{A}) (and how would \bar{A} be computed)?