[Discussion] Sella Falling into Local Minima During Transition State Search: Questions and Issues

Issue Description: I encountered some issues while testing the transition state of water dissociation on a copper surface using Sella. I attempted to reproduce previous calculations by using VASP + VaspInteractive as the ASE calculator and adjusting the transition state bond H···O in the previous transition state structure, compressing its distance by 0.1 Å as the initial structure for testing. A total of three tests were conducted, with the details as follows:

1. Test1: Default hyperparameters were used, and Sella failed to locate the transition state, instead optimizing to a local minimum. 2. Test2: I adjusted the hyperparameter eta to 0.005, and Sella successfully found the transition state. 3. Test3: The H···O bond was shortened by 0.3 Å as the initial structure, with eta adjusted to 0.005 and other hyperparameters kept as default. Sella still failed to find the transition state and optimized to a local minimum.

Questions and Issues:

From the tests and my reading of the literature, I have several questions:

1. Impact of the Initial Hessian Matrix: When the FOSP (First Order Saddle Point) of the initial structure is far from the true saddle point and the initial Hessian matrix has no negative eigenvalues, Sella often falls into a local minimum rather than moving towards the saddle point. Although Sella checks the negative eigenvalues of the updated Hessian matrix $$B_{k+1}$$ and re-diagonalizes if none are found, I am unsure whether this strategy is always effective, especially when the initial structure is far from the saddle point. 2. Checking Negative Eigenvalues of the Hessian: How does Sella check whether the Hessian matrix contains negative eigenvalues at each iteration? Are there any variables, attributes, or logs that can be checked, which would help me monitor whether the optimization direction is moving towards the saddle point? 3. Explanation for the Effect of eta in the Tests: In Test2, increasing eta allowed Sella to find the transition state. However, typically, a smaller eta is considered better. My hypothesis is that if eta is too small, the gradient calculated by VASP may contain excessive noise, leading to a poor initial approximation of the Hessian matrix, and thus affecting the optimization. Is this explanation reasonable? 4. Appearance of Perturbed Structures: In the trajectory file, I observed numerous perturbed structures following the initial structure. Are these perturbations due to Sella using finite difference methods to calculate the initial Hessian matrix? 5. Saving of Perturbed Structures: These perturbed structures are being saved in the trajectory file by Sella. Is there a way to avoid saving these perturbations in the trajectory, to simplify the file? 6. Recommended Hyperparameters for VASP: Are there any recommended hyperparameters when using VASP as the calculator with Sella? Specifically, settings that could help avoid falling into local minima during transition state searches due to inaccurate Hessian matrix approximations.

I would greatly appreciate any advice or solutions regarding these issues.

One of the design philosophies behind Sella is to be a simple local optimizer, without fancy algorithms to escape from minima wells, e.g. growing string method. What this means is that as a user of Sella, it is your responsibility to initialize the structure as close to the desired saddle point as possible. Ideally, the PES Hessian of your initial structure should have exactly one imaginary frequency that is non-orthogonal to the desired reaction. This means you should relax atomic degrees of freedom that are not directly relevant to your reaction.

This can be accomplished by staging your optimization in two steps: First, perform a constrained minimization, with some inter-atomic coordinate that relates to your desired reaction fixed to a strained configuration (e.g. in your case, fix your O-H bond to a distance over 1 Angstrom). This can be done with Sella using standard ASE constraints (see this wiki page for more detail) and by setting the order=0 parameter. Then, once that has converged, perform an unconstrained saddle point optimization. Note that you should try to keep the number of constrained DOF to a minimum, ideally just 1. The more coordinates you constrain, the more unrelaxed coordinates your initial guess for the FOSP optimization will have. If necessary, you can define custom constraints as an arbitrary function of the atomic coordinates (so long as your function is twice-differentiable), though this is rarely necessary.

To answer some of your other questions:

I believe my message above answers this, but to be clear, you are correct that Sella's strategy does not work well when you start in a local minimum well. We do not claim to solve the local minimum escape problem.
Sella checks whether the approximate Hessian contains at least 1 negative mode at every iteration, and rediagonalizes only if it does not (with the hope that the approximate Hessian is wrong and the true Hessian does have a negative mode. If we've actually drifted into a minimum-energy well, then there's not a lot we can do). At one point there were debug logs that printed the leftmost eigenvalue of the approximate Hessian, but I believe those have been removed... Perhaps we should add that to the standard optimization log.
eta is the step size used for the finite difference iterative diagonalization step. There is a balance to be struck here. If eta is too small, then the difference in the gradient will be dominated by noise, and you will be unable to gather useful information about the PES curvature. If eta is to large, then the finite difference approximation will not be accurate. By default, Sella uses a value of eta that is on the high end of optimal for a PES that has analytical (exact) gradients, like a force field. When your PES is solved self-consistently (e.g. wavefunction based methods or DFT), it may be necessary to tighten your SCF convergence criteria to get good results. This is most definitely true of VASP, which has poor-quality gradients using its default SCF convergence criteria.
Correct, those are the iterative diagonalization steps.

You can try saving your own trajectory file containing just the optimization steps:

mytraj = Trajectory("my_trajectory.traj", "w")
for converged in opt.irun(...):
mytraj.write(atoms)

VASP, like most DFT codes, does not provide a keyword to control the precision of the gradient (GPAW is the one code I'm aware of with this feature). The best you can do is tighten EDIFF, which controls precision of the energy. The VASP wiki suggests using EDIFF=1e-6 for finite difference calculations. You can try that, but I suspect 1e-7 or even 1e-8 would perform better (if you can get the SCF to converge that tightly).

One of the design philosophies behind Sella is to be a simple local optimizer, without fancy algorithms to escape from minima wells, e.g. growing string method. What this means is that as a user of Sella, it is your responsibility to initialize the structure as close to the desired saddle point as possible. Ideally, the PES Hessian of your initial structure should have exactly one imaginary frequency that is non-orthogonal to the desired reaction. This means you should relax atomic degrees of freedom that are not directly relevant to your reaction.

This can be accomplished by staging your optimization in two steps: First, perform a constrained minimization, with some inter-atomic coordinate that relates to your desired reaction fixed to a strained configuration (e.g. in your case, fix your O-H bond to a distance over 1 Angstrom). This can be done with Sella using standard ASE constraints (see this wiki page for more detail) and by setting the order=0 parameter. Then, once that has converged, perform an unconstrained saddle point optimization. Note that you should try to keep the number of constrained DOF to a minimum, ideally just 1. The more coordinates you constrain, the more unrelaxed coordinates your initial guess for the FOSP optimization will have. If necessary, you can define custom constraints as an arbitrary function of the atomic coordinates (so long as your function is twice-differentiable), though this is rarely necessary.

To answer some of your other questions:

I believe my message above answers this, but to be clear, you are correct that Sella's strategy does not work well when you start in a local minimum well. We do not claim to solve the local minimum escape problem.

Sella checks whether the approximate Hessian contains at least 1 negative mode at every iteration, and rediagonalizes only if it does not (with the hope that the approximate Hessian is wrong and the true Hessian does have a negative mode. If we've actually drifted into a minimum-energy well, then there's not a lot we can do). At one point there were debug logs that printed the leftmost eigenvalue of the approximate Hessian, but I believe those have been removed... Perhaps we should add that to the standard optimization log.

eta is the step size used for the finite difference iterative diagonalization step. There is a balance to be struck here. If eta is too small, then the difference in the gradient will be dominated by noise, and you will be unable to gather useful information about the PES curvature. If eta is to small, then the finite difference approximation will not be accurate. By default, Sella uses a value of eta that is on the high end of optimal for a PES that has analytical (exact) gradients, like a force field. When your PES is solved self-consistently (e.g. wavefunction based methods or DFT), it may be necessary to tighten your SCF convergence criteria to get good results. This is most definitely true of VASP, which has poor-quality gradients using its default SCF convergence criteria.

Correct, those are the iterative diagonalization steps.

You can try saving your own trajectory file containing just the optimization steps:
mytraj = Trajectory("my_trajectory.traj", "w")
for converged in opt.irun(...):
    mytraj.write(atoms)
VASP, like most DFT codes, does not provide a keyword to control the precision of the gradient (GPAW is the one code I'm aware of with this feature). The best you can do is tighten EDIFF, which controls precision of the energy. The VASP wiki suggests using EDIFF=1e-6 for finite difference calculations. You can try that, but I suspect 1e-7 or even 1e-8 would perform better (if you can get the SCF to converge that tightly).

Thanks for your reply, all my doubts have been clarified, which has been very helpful to me. Additionally, I have also sent you an email, which you can ignore if you see it, as the questions are the same. Thank you once again for your answers.

One of the design philosophies behind Sella is to be a simple local optimizer, without fancy algorithms to escape from minima wells, e.g. growing string method. What this means is that as a user of Sella, it is your responsibility to initialize the structure as close to the desired saddle point as possible. Ideally, the PES Hessian of your initial structure should have exactly one imaginary frequency that is non-orthogonal to the desired reaction. This means you should relax atomic degrees of freedom that are not directly relevant to your reaction.

This can be accomplished by staging your optimization in two steps: First, perform a constrained minimization, with some inter-atomic coordinate that relates to your desired reaction fixed to a strained configuration (e.g. in your case, fix your O-H bond to a distance over 1 Angstrom). This can be done with Sella using standard ASE constraints (see this wiki page for more detail) and by setting the order=0 parameter. Then, once that has converged, perform an unconstrained saddle point optimization. Note that you should try to keep the number of constrained DOF to a minimum, ideally just 1. The more coordinates you constrain, the more unrelaxed coordinates your initial guess for the FOSP optimization will have. If necessary, you can define custom constraints as an arbitrary function of the atomic coordinates (so long as your function is twice-differentiable), though this is rarely necessary.

To answer some of your other questions:

I believe my message above answers this, but to be clear, you are correct that Sella's strategy does not work well when you start in a local minimum well. We do not claim to solve the local minimum escape problem.

Sella checks whether the approximate Hessian contains at least 1 negative mode at every iteration, and rediagonalizes only if it does not (with the hope that the approximate Hessian is wrong and the true Hessian does have a negative mode. If we've actually drifted into a minimum-energy well, then there's not a lot we can do). At one point there were debug logs that printed the leftmost eigenvalue of the approximate Hessian, but I believe those have been removed... Perhaps we should add that to the standard optimization log.

eta is the step size used for the finite difference iterative diagonalization step. There is a balance to be struck here. If eta is too small, then the difference in the gradient will be dominated by noise, and you will be unable to gather useful information about the PES curvature. If eta is to large, then the finite difference approximation will not be accurate. By default, Sella uses a value of eta that is on the high end of optimal for a PES that has analytical (exact) gradients, like a force field. When your PES is solved self-consistently (e.g. wavefunction based methods or DFT), it may be necessary to tighten your SCF convergence criteria to get good results. This is most definitely true of VASP, which has poor-quality gradients using its default SCF convergence criteria.

Correct, those are the iterative diagonalization steps.

You can try saving your own trajectory file containing just the optimization steps:
mytraj = Trajectory("my_trajectory.traj", "w")
for converged in opt.irun(...):
    mytraj.write(atoms)
VASP, like most DFT codes, does not provide a keyword to control the precision of the gradient (GPAW is the one code I'm aware of with this feature). The best you can do is tighten EDIFF, which controls precision of the energy. The VASP wiki suggests using EDIFF=1e-6 for finite difference calculations. You can try that, but I suspect 1e-7 or even 1e-8 would perform better (if you can get the SCF to converge that tightly).

Hello! I noticed this reply, and I'm exploring the possibilities for combining Sella with NEB and soft-scan (with constraint optimization), do you have more experience for combining Sella with other TS tools to get a better TS exploration workflow? (especially in PBC system ?)

Sella has been successfully used in the https://github.com/zadorlab/pynta workflow. In Pynta we use our own algorithm called HFSP in PBC calculations to construct good saddle guesses. Sella has also been used in conjunction with NEB. In fact, this is the intent of Sella: use an algorithm to get close to a saddle point, and then use Sella to optimize it to a true saddle.

One of the design philosophies behind Sella is to be a simple local optimizer, without fancy algorithms to escape from minima wells, e.g. growing string method. What this means is that as a user of Sella, it is your responsibility to initialize the structure as close to the desired saddle point as possible. Ideally, the PES Hessian of your initial structure should have exactly one imaginary frequency that is non-orthogonal to the desired reaction. This means you should relax atomic degrees of freedom that are not directly relevant to your reaction.

This can be accomplished by staging your optimization in two steps: First, perform a constrained minimization, with some inter-atomic coordinate that relates to your desired reaction fixed to a strained configuration (e.g. in your case, fix your O-H bond to a distance over 1 Angstrom). This can be done with Sella using standard ASE constraints (see this wiki page for more detail) and by setting the order=0 parameter. Then, once that has converged, perform an unconstrained saddle point optimization. Note that you should try to keep the number of constrained DOF to a minimum, ideally just 1. The more coordinates you constrain, the more unrelaxed coordinates your initial guess for the FOSP optimization will have. If necessary, you can define custom constraints as an arbitrary function of the atomic coordinates (so long as your function is twice-differentiable), though this is rarely necessary.

To answer some of your other questions:

I believe my message above answers this, but to be clear, you are correct that Sella's strategy does not work well when you start in a local minimum well. We do not claim to solve the local minimum escape problem.

Sella checks whether the approximate Hessian contains at least 1 negative mode at every iteration, and rediagonalizes only if it does not (with the hope that the approximate Hessian is wrong and the true Hessian does have a negative mode. If we've actually drifted into a minimum-energy well, then there's not a lot we can do). At one point there were debug logs that printed the leftmost eigenvalue of the approximate Hessian, but I believe those have been removed... Perhaps we should add that to the standard optimization log.

eta is the step size used for the finite difference iterative diagonalization step. There is a balance to be struck here. If eta is too small, then the difference in the gradient will be dominated by noise, and you will be unable to gather useful information about the PES curvature. If eta is to large, then the finite difference approximation will not be accurate. By default, Sella uses a value of eta that is on the high end of optimal for a PES that has analytical (exact) gradients, like a force field. When your PES is solved self-consistently (e.g. wavefunction based methods or DFT), it may be necessary to tighten your SCF convergence criteria to get good results. This is most definitely true of VASP, which has poor-quality gradients using its default SCF convergence criteria.

Correct, those are the iterative diagonalization steps.

You can try saving your own trajectory file containing just the optimization steps:
mytraj = Trajectory("my_trajectory.traj", "w")
for converged in opt.irun(...):
    mytraj.write(atoms)
VASP, like most DFT codes, does not provide a keyword to control the precision of the gradient (GPAW is the one code I'm aware of with this feature). The best you can do is tighten EDIFF, which controls precision of the energy. The VASP wiki suggests using EDIFF=1e-6 for finite difference calculations. You can try that, but I suspect 1e-7 or even 1e-8 would perform better (if you can get the SCF to converge that tightly).

After my recent research, I have a clearer understanding of Sella's optimization strategy. The core steps of this strategy include:

Partition Rationalization Function: First, diagonalize the Hessian matrix.
Coordinate Transformation: Transform the gradient in Cartesian coordinates to a new space spanned by local principal axes using a unitary matrix.
Displacement Calculation: Calculate the displacement for each iteration based on the properties of eigenvalues and eigenvectors.

However, this strategy has an issues: if the initial point is not selected in the negative curvature region, the optimization process easily slides into local minima, and the search direction for saddle points cannot be effectively controlled.

In the field of chemistry, the displacement direction of a molecule transitioning from a stable state to a transition state can usually be approximated by the breaking or formation of chemical bonds. For example, if it is clear which chemical bond will break, the next iteration direction should increase the length of that chemical bond. Therefore, I am trying to effectively utilize this heuristic direction information in the saddle point optimization.

To express this idea more academically, we can introduce some mathematical formulas:

Definition of Transfer Direction: Let the transfer direction be $$\mathbf{e}$$, which is a unit vector.
Construction of Cone Constraint: Construct a cone with $$\mathbf{e}$$ as the axis and $$\phi$$ as the angle. The displacement for each iteration must be within this cone.

Specific mathematical expressions are as follows:

Cone Constraint Condition: $$\mathbf{d}_{k}\cdot\mathbf{e}\geq \Vert \mathbf{d}_{k} \Vert cos(\phi)$$ where $$\mathbf{d}_{k}$$ is the displacement vector for the k-th step.
Trust Region Method Adapted to Cone Constraint: $$\min _{\mathbf{d}_{k}} F\left(\mathbf{x}_{k}+\mathbf{d}{k}\right) \quad \text { s.t. } \quad\left|\mathbf{d}_{k}\right| \leq \Delta_{k} \quad \text { and } \quad \mathbf{d}_{k} \cdot \mathbf{e} \geq\left|\mathbf{d}_{k}\right| \cos (\phi)$$ where $$\Delta\{k}$$ is the trust region radius.

Although the idea was clear, my math was poor, and some of the ideas might not be right, or not as easy to implement. Have you ever wondered about this question? I'd like to hear your opinion.

There is huge literature out there to try to come up with all sorts of methods to find saddles. NEB is a popular one, but there are many other ideas, each might be performing better for certain cases, have failures for different edge cases, and try to solve a slightly different problem. Since atomistic PESs are quite complicated, there is no one good solution. For instance, what happens to the proposed method if there are two very nearby saddles. And what happens, if the true saddle is outside the cone. Nevertheless, the good news is that you are welcome to experiment with your ideas using Sella. We have sophisticated constraints that can be employed, including inequality ones, so all you have to do is to code up your equations and test it on the systems you want.

zadorlab / sella

[Discussion] Sella Falling into Local Minima During Transition State Search: Questions and Issues #50