tensorflow / kfac

An implementation of KFAC for TensorFlow
Apache License 2.0
197 stars 41 forks source link

How to center a moving average #43

Open xmax1 opened 4 years ago

xmax1 commented 4 years ago

What is the best way to center the moving averages?

If analytically our the activation kronecker factor is given by (a - \bar{a})^T(a - \bar{a}) where a are the instantaneous activations but we use the in practice we use the moving averages of the covariances A how does the centering affect this?

Does it suffice to center the instantaneous activations (as here (a - \bar{a})^T(a - \bar{a})) or, for example, is there a running average of the center also (A - \bar{A}) (and how would \bar{A} be computed)?

james-martens commented 4 years ago

K-FAC shouldn't use centered activation statistics.

However if you wanted to do something like this in some general context, the correct thing would probably be to take a decayed average of aa^T and subtract from that \hat{a}\hat{a}^T where \hat{a} is the decayed average of a (computed using the same coefficients as was used for the decayed average of aa^T).