Closed xmax1 closed 4 years ago
The docstring is wrong/misleading here, and I'll change it. This is what it should say:
Note that this is related to Omega in https://arxiv.org/abs/1602.01407 except that here we normalize by the number of locations (k). By setting the renormalization coefficient ("_renorm_coeff") in the block class to k we get the same overall block approximation from the paper.
F = (k / k) [ (1/M) A A^T \otimes (1 / (Mk)) S S^T ] F = k [ (1/(Mk)) A A^T \otimes (1 / (Mk)) S S^T ]
comme ca?
Something like that.
Is the implementation of the input kronecker factor different from the kfc paper?
Equation 32 https://arxiv.org/pdf/1602.01407.pdf here \Omega is not divided by the num_spatial_locations whereas line 1673 https://github.com/tensorflow/kfac/blob/cf6265590944b5b937ff0ceaf4695a72c95a02b9/kfac/python/ops/fisher_factors.py#L1673 here clearly indicates (and is implemented) that the expectation is taken over both the batch size and the spatial locations.
Thanks for any clarifications,
Max