noahgolmant / pytorch-hessian-eigenthings

Efficient PyTorch Hessian eigendecomposition tools!
MIT License
360 stars 43 forks source link

Computing the smallest eigenpairs #11

Closed themightyoarfish closed 3 years ago

themightyoarfish commented 5 years ago

Is it possible to use this method as well to compute the smallest eigenpairs as well? I would like to see whether this is useful in the context of this work: Negative Eigenvalues of the Hessian in Deep Neural Networks.

themightyoarfish commented 5 years ago

This SE answer (and the comments) seem to suggest that performing the same power iteration on B = A − λ_max * I (i.e. subtracting the largest EV from the diagonal) would yield the smallest (most negative) EV (the question is about smallest magnitude, but one comment states that).

Comments below this other SE answer suggest simply using power iteration on -A will converge to the largest negative eigenvalue.

Can you confirm this? Is it possible without computing the largest one first?

noahgolmant commented 5 years ago

You can't just use power iteration on -A. The first SE answer has worked in my experience. I have the code lying around somewhere. Per the first answer in the second SE link, you can only do it without computing the largest one if you already have an upper bound on the global Lipschitz constant of your problem. If you just set it to some arbitrary huge value, this could introduce numerical instability.

noahgolmant commented 5 years ago

There is a good amount of recent theoretical work on using negative curvature to find second-order stationary points, e.g. noisy negative curvature descent. I am also interested in creating a practical implementation of one of these algorithms at some point.

noahgolmant commented 5 years ago

Implementation-wise (if you are up to it), this amounts to: (1) Computing the largest principal eigenvalue, call it p. (2) Creating a LambdaOperator that just computes HVPOperator.apply(x) - p * x. Run the deflated power iteration routine with this new operator. Add p to all the eigenvalues at the end.

themightyoarfish commented 5 years ago

So basically, use the existing code to obtain p, subclass the HVPOperator and overrid apply() with return super().apply() - p * vec? That would seem simple enough.

noahgolmant commented 5 years ago

There's no need to subclass HVPOperator. You can create a LambdaOperator where the provided lambda function is lambda x: hvp_op.apply(x) - p * x.