Closed noahgolmant closed 5 years ago
The basic power iteration seems to be okay. This is testing on Wishart matrices. As expected, the actual eigenvector estimates have higher variance, but the eigenvalues are good. The cosine similarity is non-monotonic because I sorted them by eigenvalue magnitude.
Eigenvalue computation for the Hessian on the full dataset appears to be okay. Eigenvector estimates are worse, though. "True" eigenvalues are computed using np.linalg.eig
.
This was computed using a linear model with MSE loss on randomly generated data.
Here are the estimates with a stochastic gradient (100 samples, batch size 10)
Here it is using a fixed mini-batch:
The current eigenvalue estimates have extremely high variance since the mini-batch size required for a stable Hessian estimate seems to be very large for reasonable datasets/models (issues #22 and #17).
TODO:
(1) Verify that setting the mini-batch size to the size of the dataset (with chunking) is equivalent to vanilla power iteration in HessianFlow repo (2) Calculate variance of eigenvalue estimate (top-k) using the current technique as a function of mini-batch size (3) Test eigenvalue estimate averaging using a fixed mini-batch size (4) Test running the full procedure on a fixed mini-batch, repeating, then creating a full estimate from this