Closed themightyoarfish closed 3 years ago
This SE answer (and the comments) seem to suggest that performing the same power iteration on B = A − λ_max * I (i.e. subtracting the largest EV from the diagonal) would yield the smallest (most negative) EV (the question is about smallest magnitude, but one comment states that).
Comments below this other SE answer suggest simply using power iteration on -A will converge to the largest negative eigenvalue.
Can you confirm this? Is it possible without computing the largest one first?
You can't just use power iteration on -A. The first SE answer has worked in my experience. I have the code lying around somewhere. Per the first answer in the second SE link, you can only do it without computing the largest one if you already have an upper bound on the global Lipschitz constant of your problem. If you just set it to some arbitrary huge value, this could introduce numerical instability.
There is a good amount of recent theoretical work on using negative curvature to find second-order stationary points, e.g. noisy negative curvature descent. I am also interested in creating a practical implementation of one of these algorithms at some point.
Implementation-wise (if you are up to it), this amounts to:
(1) Computing the largest principal eigenvalue, call it p
.
(2) Creating a LambdaOperator
that just computes HVPOperator.apply(x) - p * x
. Run the deflated power iteration routine with this new operator. Add p
to all the eigenvalues at the end.
So basically, use the existing code to obtain p
, subclass the HVPOperator
and overrid apply()
with return super().apply() - p * vec
? That would seem simple enough.
There's no need to subclass HVPOperator
. You can create a LambdaOperator
where the provided lambda function is lambda x: hvp_op.apply(x) - p * x
.
Is it possible to use this method as well to compute the smallest eigenpairs as well? I would like to see whether this is useful in the context of this work: Negative Eigenvalues of the Hessian in Deep Neural Networks.