Test eigenvalue stability and alternative variance reduction techniques

noahgolmant commented 5 years ago

The current eigenvalue estimates have extremely high variance since the mini-batch size required for a stable Hessian estimate seems to be very large for reasonable datasets/models (issues #22 and #17).

TODO:

(1) Verify that setting the mini-batch size to the size of the dataset (with chunking) is equivalent to vanilla power iteration in HessianFlow repo (2) Calculate variance of eigenvalue estimate (top-k) using the current technique as a function of mini-batch size (3) Test eigenvalue estimate averaging using a fixed mini-batch size (4) Test running the full procedure on a fixed mini-batch, repeating, then creating a full estimate from this

noahgolmant commented 5 years ago

The basic power iteration seems to be okay. This is testing on Wishart matrices. As expected, the actual eigenvector estimates have higher variance, but the eigenvalues are good. The cosine similarity is non-monotonic because I sorted them by eigenvalue magnitude.

noahgolmant commented 5 years ago

Eigenvalue computation for the Hessian on the full dataset appears to be okay. Eigenvector estimates are worse, though. "True" eigenvalues are computed using np.linalg.eig.

This was computed using a linear model with MSE loss on randomly generated data.

Hessian

noahgolmant commented 5 years ago

Here are the estimates with a stochastic gradient (100 samples, batch size 10)

stoch

noahgolmant commented 5 years ago

Here it is using a fixed mini-batch:

Fixed

noahgolmant / pytorch-hessian-eigenthings

Test eigenvalue stability and alternative variance reduction techniques #23