ConvInputKroneckerFactor different from kfc paper

tensorflow / kfac

An implementation of KFAC for TensorFlow

Apache License 2.0

197 stars 41 forks source link

ConvInputKroneckerFactor different from kfc paper #41

Closed xmax1 closed 4 years ago

xmax1 commented 4 years ago

Is the implementation of the input kronecker factor different from the kfc paper?

Equation 32 https://arxiv.org/pdf/1602.01407.pdf here \Omega is not divided by the num_spatial_locations whereas line 1673 https://github.com/tensorflow/kfac/blob/cf6265590944b5b937ff0ceaf4695a72c95a02b9/kfac/python/ops/fisher_factors.py#L1673 here clearly indicates (and is implemented) that the expectation is taken over both the batch size and the spatial locations.

Thanks for any clarifications,

Max

james-martens commented 4 years ago

The docstring is wrong/misleading here, and I'll change it. This is what it should say:

Note that this is related to Omega in https://arxiv.org/abs/1602.01407 except that here we normalize by the number of locations (k). By setting the renormalization coefficient ("_renorm_coeff") in the block class to k we get the same overall block approximation from the paper.

xmax1 commented 4 years ago

F = (k / k) [ (1/M) A A^T \otimes (1 / (Mk)) S S^T ] F = k [ (1/(Mk)) A A^T \otimes (1 / (Mk)) S S^T ]

comme ca?

james-martens commented 4 years ago

Something like that.