Open annulen opened 1 year ago
I've also run ORCA optimization with QN, it converged in 7 steps: PMDA_chair_dimer2_vdzp_orca_qn.tar.gz
GDIIS took 8 steps like RFO.
So I suspect that the culprit may be incorrect handling of Hessian in geomeTRIC. In particular, I see that geomeTRIC by default gets rid of negative eigenvalues by regenerating guess Hessian, while ORCA continues despite having negative eigenvalue in Hessian.
Thanks for sharing this. I agree geomeTRIC is not behaving properly in this calculation when you have provided a high-quality Hessian in the beginning. I would expect it to take several good steps if you have an exact initial Hessian, but it seems that Step 2 is already low quality. I am curious what happens if you restrict the step using --tmax 0.03
or something like that?
I am seeing one other difference. In the beginning of the optimization, it appears several Hessian eigenvalues are negative:
Step 0 : Gradient = 3.257e-03/7.213e-03 (rms/max) Energy = -122.0432564665
Hessian Eigenvalues: -5.67264e-03 -4.86780e-03 -3.65937e-03 ... 4.11571e-01 4.20647e-01 4.20864e-01
Step 1 : Displace = 3.618e-02/5.932e-02 (rms/max) Trust = 1.000e-01 (=) Grad = 1.728e-04/4.523e-04 (rms/max) E (change) = -122.0441496198 (-8.932e-04) Quality = 0.855
Hessian Eigenvalues: -7.65921e-03 -4.89374e-03 -3.70047e-03 ... 4.11227e-01 4.19520e-01 4.20647e-01
Step 2 : Displace = 1.427e-01/2.310e-01 (rms/max) Trust = 1.414e-01 (+) Grad = 3.257e-03/7.006e-03 (rms/max) E (change) = -122.0431869362 (+9.627e-04) Quality = -0.219
Hessian Eigenvalues: -4.80642e-03 -3.69764e-03 -2.78662e-03 ... 4.17171e-01 4.20647e-01 5.33110e-01
However, this is not the behavior seen in ORCA as there is only one negative eigenvalue. There may be an issue with conversion of the Hessian into internal coordinates. If that is true, it's a more serious problem that needs fixing.
I think this will require some time for me to diagnose, thanks a lot for sharing the data.
I am curious what happens if you restrict the step using --tmax 0.03 or something like that?
Will check it now. BTW, at least in ORCA there is a clear separation between maximum step and trust radius. The former is a maximum for changes of individual internal coordinates, while trust radius limits length of step vector. Both limits can be set separately, though I didn't have much success with tight limits on MaxStep (I guess because it distorts step direction too much)
With --tmax 0.03
minimization converges, but only after step 86. Initial steps look much better, but later it begins to struggle.
PMDA_chair_dimer2_tmax.tar.gz
An issue I have observed in the past that could be related to what you're seeing is that sometimes the BFGS Hessian update results in the appearance of negative eigenvalues in the Hessian matrix. I think BFGS is supposed to preserve positive definiteness so there may be something going on there.
If you could also let me know if geomeTRIC performs differently using a different coordinate system (try --coordsys cart
or --coordsys dlc
that would be helpful. Meanwhile I will work on getting my local environment set up to use ASE / ORCA as the engine.
An issue I have observed in the past that could be related to what you're seeing is that sometimes the BFGS Hessian update results in the appearance of negative eigenvalues in the Hessian matrix.
FWIW I see this happening in ORCA too, both with model and pre-computed hessians
Interestingly, with --coordsys cart
all eigenvalues on Step 0 are positive, while with --coordsys dlc
there are three negatives similarly to TRIC
--coordsys cart
Step 0 : Gradient = 3.257e-03/7.213e-03 (rms/max) Energy = -122.0432564665
Hessian Eigenvalues: 1.09285e-08 1.41624e-08 3.06648e-08 ... 9.28710e-01 1.04992e+00 1.05461e+00
--coordsys dlc
:
Step 0 : Gradient = 3.257e-03/7.213e-03 (rms/max) Energy = -122.0432564665
Hessian Eigenvalues: -2.86597e-02 -8.31328e-03 -5.53354e-03 ... 4.20077e-01 4.20665e-01 4.30463e-01
Results: Cartesian converged after step 100, DLC — after step 20. PMDA_chair_dimer2_cart.tar.gz PMDA_chair_dimer2_dlc.tar.gz
I've re-run DLC without --tmax 0.03
, it managed to converge in 18 steps, however convergence was very shaky with lots of missteps and quality reaching -25.
An issue I have observed in the past that could be related to what you're seeing is that sometimes the BFGS Hessian update results in the appearance of negative eigenvalues in the Hessian matrix. I think BFGS is supposed to preserve positive definiteness so there may be something going on there.
I've found this in manual (https://geometric.readthedocs.io/en/latest/how-it-works.html#bfgs-hessian-update):
A possible way to improve performance could be to modify the optimizer to perform a line search in cases where BFGS does not return a positive definite Hessian matrix, until the Wolfe conditions are satisfied.
So, it looks like you were already aware of this kind of issue.
Thanks a lot for this data. I think it's a clear example of a need for improving the optimization performance. I previously did not run into such an issue because I don't usually have an exact Hessian at the start of an energy minimization. It might take me a few weeks to resolve the problem, since it would require setting up the environment on my side, (possibly) making a test case that runs in a short time, and implementing the improvements themselves; the end of the term is getting pretty busy over here with graduate admissions and final exams, but this is important.
Okay, no problem. Thanks!
I have a few ideas for improving the performance or diagnosing problems for your job (or more generally, energy minimizations that have an exact Hessian at the start). You might be able to try some of these ideas before I can get to them.
1) Try running a job with --fdcheck yes
. I'm suspecting that there could be a problem with the internal coordinate Hessian for this molecule that is not there for other molecules. It runs a total of four checks, and if there are any problems, the check_internal_hess
function should print out significant differences between the analytic and finite-difference Hessian (say greater than 1e-3).
2) Try using the get_hessian_update_tsbfgs
or the get_hessian_update_msp
functions to get the Hessian updates. They are intended for TS optimization, and are designed to be used with Hessians that are not positive definite.
3) Try using get_delta_prime_rfo
in place of get_delta_prime_trm
for the optimization step. This is not very likely to succeed because I haven't extensively tested get_delta_prime_rfo
and it might not be correct. This is closer to what the transition state optimization uses, and I have not had serious problems with providing an exact initial Hessian for TS optimizations.
4) Try to reproduce the problem using a very inexpensive level of theory, perhaps even using ASE/XTB, that would help with diagnosing and resolving the problem a lot faster.
Let me know if you're interested in trying these, otherwise, I should have more time by the end of this week to get to them.
Trying --fdcheck yes
with --coordsys dlc
and no tmax
now (as it was the quickest convergence but still obviously flawed).
or more generally, energy minimizations that have an exact Hessian at the start
To be fair my Hessian is not "exact" in strict sense. This molecule was optimized at B3LYP/6-31G** level, and starting geometry and Hessian are taken from there. In my understanding Hessian would be "exact" if it was computed at the same level of theory on starting geometry, in that case it would also likely have big negative eigenvalues with corresponding modes pointing towards minimum. This is not the case with my Hessian which is positively defined initially.
I've tried xtb (GFN2) using ORCA (with the same optimizer settings as before) and geomeTRIC (with dlc coordinates) as optimizers with the same starting geometry and Hessian. As a result, geomeTRIC has converged faster: 31 step vs 43 steps. geomeTRIC_xtb.tar.gz orca_xtb.tar.gz
Does fdcheck
actually use computed gradients at "auxiliary" points? If it could calculate only energies at those points it would be 1.5–2 times faster.
Correct, at the moment geomeTRIC can only request gradients from external codes. In the next version I plan on adding the ability to request energies and analytic Hessians (if available).
Here are results of fdcheck/dlc run: fdcheck.tar.gz
Trying tsbfgs with this patch now:
diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..0548648 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -357,7 +357,7 @@ def update_hessian(IC, H0, xyz_seq, gradx_seq, params, trust_limit=False, max_up
if ts_bfgs: Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
else: Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
else:
- Hup = get_hessian_update_bfgs(Dy, Dg, H)
+ Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
# Compute some diagnostics.
if params.verbose:
Eig = sorted_eigh(H, asc=True)[0]
Update: needed to add following patch to avoid errors when code attempts to access self.H
:
diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..8cf3424 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -278,8 +278,8 @@ def get_hessian_update_tsbfgs(Dy, Dg, H): # pragma: no cover
# yk = col(self.G - self.Gprev)
yk = Dg
dk = Dy
- jk = yk - np.dot(self.H, dk)
- B = force_positive_definite(self.H)
+ jk = yk - np.dot(H, dk)
+ B = force_positive_definite(H)
# Scalar 1: dk^T |Bk| dk
s1 = multi_dot([dk.T, B, dk])
# Scalar 2: (yk^T dk)^2 + (dk^T |Bk| dk)^2
Results for run with above patches: tsbfgs.tar.gz
It didn't converge after 81 steps (unlike get_hessian_update_bfgs
version that converged after 18 steps, see https://github.com/leeping/geomeTRIC/issues/186#issuecomment-1817410336). Note that get_hessian_update_bfgs
has 3 missteps early one. tsbfgs seems to be monotonic in the beginning, however it has missteps at 7 and 8, and since step 9 tsbfgs steps have worse energy
I've activated MSP updates with patch
diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..b8dad9d 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -357,7 +357,7 @@ def update_hessian(IC, H0, xyz_seq, gradx_seq, params, trust_limit=False, max_up
if ts_bfgs: Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
else: Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
else:
- Hup = get_hessian_update_bfgs(Dy, Dg, H)
+ Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
# Compute some diagnostics.
if params.verbose:
Eig = sorted_eigh(H, asc=True)[0]
Result: converged in 10 steps which is much better than anything above! It's still worse than ORCA (7-8 steps, uses BFGS), also it features 3 missteps and ends with two steps with positive energy changes. msp.tar.gz
Interestingly, MSP at step 8 was really close by energy value to ORCA's final result (-122.0441832
), however last two steps made energy higher (of course it doesn't say anything about geometrical proximity)
Thanks a lot. I think the cause of the bad performance could be one of the following:
1) Our BFGS implementation is unable to update a Hessian properly when there are negative eigenvalues. MSP works better, but it's mainly intended to be used for transition state optimizations. I need to do some research about what is supposed to happen during a BFGS update when the underlying PES is not positive definite.
2) The transformed Hessian has negative eigenvalues, which could indicate a problem with one of the ICs. The --fdcheck
job prints out information in four blocks, and the second block might have the information I need. Could you share the output file you have so far?
Our BFGS implementation is unable to update a Hessian properly when there are negative eigenvalues. MSP works better, but it's mainly intended to be used for transition state optimizations. I need to do some research about what is supposed to happen during a BFGS update when the underlying PES is not positive definite.
If you are interested I can record Hessian updates made by ORCA after each step so you could compare their implementation of BFGS with yours.
Switching to MSP is indeed a bad idea for generic case. I've actually seen it slowing down convergence in a different case when I forgot that my geomeTRIC installation was patched.
The transformed Hessian has negative eigenvalues, which could indicate a problem with one of the ICs. The --fdcheck job prints out information in four blocks, and the second block might have the information I need. Could you share the output file you have so far?
I think my fdcheck.tar.gz I've attached above contained the complete log (as fdcheck/PMDA_chair_dimer2.log
). Does it miss anything?
Sorry, I must have missed that you posted the log above. Thanks!
At the moment I don't think we need to check the Hessian update vs. Orca. The step size and gradient in Orca will probably be different from the one in geomeTRIC, due to differences in how the IC system is constructed. If you want to do this, you would need to pass Orca's Hessian, step, and gradient change into geomeTRIC's Hessian update function, and check that it gives the same update. However I don't think that's the most likely cause of the problem.
From reading the log, I can see the second derivs of internal coordinates w/r.t. Cartesian coordinates has good agreement with finite difference. However, the second derivs of the energy in internal coordinates has poor agreement with finite difference (see selected lines):
IC1 Name IC2 Name : Analytic Numerical Abs-Diff
Angle 8-19-28 Distance 3-11 : 0.007707 -0.000280 0.007987
Distance 1-2 Distance 7-20 : 0.002636 0.000647 0.001989
Angle 8-19-28 Angle 8-19-28 : 0.072896 0.111734 -0.038838
Dihedral 6-1-8-19 Angle 8-19-28 : 0.001434 0.015125 -0.013691
Errors of this size are big enough to cause negative eigenvalues in the IC Hessian to appear when there shouldn't be any. There might be a bug in the construction of the IC Hessian but I think we need to try two more things to be sure:
.Prims
from these lines of code around line 965 in optimize.py: if fdcheck:
IC.Prims.checkFiniteDifferenceGrad(coords)
IC.Prims.checkFiniteDifferenceHess(coords)
check_internal_grad(coords, M, IC.Prims, engine, dirname, verbose)
check_internal_hess(coords, M, IC.Prims, engine, dirname, verbose)
return
Thanks, and sorry for the trouble.
I've made another run with DLC, regular BFGS update, and --check 1
. Results are here:
dlc_check.tar.gz
Convergence is much more smooth than without check. Yes, it takes 19 steps instead of 18, but it converges to a lower energy, and there are much less missteps and no horrors like -25 quality.
My current hypothesis is that initial ICs are suboptimal and quickly become unsuitable for stable calculations, and that instability is causing missteps and re-appearance of negative Hessian eigenvalues. However, it still doesn't explain overall slow convergence in comparison to ORCA.
Running fdcheck with the following patch now:
diff --git a/geometric/ic_tools.py b/geometric/ic_tools.py
index e1bcf8d..00fbe55 100644
--- a/geometric/ic_tools.py
+++ b/geometric/ic_tools.py
@@ -126,6 +126,10 @@ def check_internal_hess(coords, molecule, IC, engine, dirname, verbose=0):
logger.info("Hessian Eigenvalues (Internal):\n")
for i in range(len(Eigq)):
logger.info("% 10.5f %s" % (Eigq[i], "\n" if i%9 == 8 else ""))
+ Eigqf = sorted(np.linalg.eigh(Hq_f)[0])
+ logger.info("Hessian Eigenvalues (Hq_f):\n")
+ for i in range(len(Eigqf)):
+ logger.info("% 10.5f %s" % (Eigqf[i], "\n" if i%9 == 8 else ""))
return Hq, Hq_f
def write_displacements(coords, M, IC, dirname, ic_select="all", displace_range=(-0.3, 0.3, 7), verbose=0):
diff --git a/geometric/optimize.py b/geometric/optimize.py
index 73d2733..c90dc17 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -959,10 +959,10 @@ def run_optimizer(**kwargs):
fdcheck = kwargs.get('fdcheck', False) # Check internal coordinate gradients using finite difference..
if fdcheck:
- IC.Prims.checkFiniteDifferenceGrad(coords)
- IC.Prims.checkFiniteDifferenceHess(coords)
- check_internal_grad(coords, M, IC.Prims, engine, dirname, verbose)
- check_internal_hess(coords, M, IC.Prims, engine, dirname, verbose)
+ IC.checkFiniteDifferenceGrad(coords)
+ IC.checkFiniteDifferenceHess(coords)
+ check_internal_grad(coords, M, IC, engine, dirname, verbose)
+ check_internal_hess(coords, M, IC, engine, dirname, verbose)
return
# Print out information about the coordinate system
Results of fdcheck with patch above: fdcheck2.tar.gz
Thanks a lot. From looking at your latest test, I can see a potential major issue with the IC Hessian.
Here are lowest four eigenvalues of the IC Hessian generated by converting the Cartesian Hessian. In normal operation, geomeTRIC calculates the IC Hessian by converting a Cartesian Hessian that is either generated by finite-difference on the Cartesian gradient or supplied by the user. (Note that fdcheck always calculates the Cartesian Hessian so if you provided a Hessian file it is not used).
Hessian Eigenvalues (Internal):
-0.01015 -0.00244 0.00027 0.00127
Here are the lowest four eigenvalues of the IC Hessian generated using finite-difference on the IC gradient.
Hessian Eigenvalues (Hq_f):
0.00046 0.00359 0.00396 0.00403
This indicates there may be something wrong with the conversion of the Cartesian Hessian into the IC Hessian, i.e. this function: https://github.com/leeping/geomeTRIC/blob/master/geometric/internal.py#L1986
That function is supposed to equation 6 in the attached paper for converting the Cartesian Hessian to the IC Hessian. That equation requires calculating $\mathbf{B}^\prime
$, the second derivatives of the ICs w/r.t. Cartesians. The second of four tests in fdcheck confirms that $\mathbf{B}^\prime
$ is correct. However, the fourth test is failing which indicates to me that $\mathbf{H}_q
$ is not being assembled correctly. The mistake might be something like a matrix that was transposed when it shouldn't be.
To resolve the issue, it would be nice to try running the fdcheck with the above patch using an optimized geometry with relatively tight criteria (say gmax ~ 1e-5). If there is still significant disagreement in the Hessian, then we know the problem isn't solely due to the second term in parentheses in Eq. 6, i.e. it could be the matrix multiplications "outside" the parentheses.
I will also install ORCA to attempt to reproduce the issue on my side.
I can think of a different approach — make a unit test for each part of calculation to ensure that all operations produce correct results.
Some tests are located in https://github.com/leeping/geomeTRIC/blob/master/geometric/tests/test_hessian.py but evidently they did not catch this problem. One problem that complicates the standard unit test approach is that there are always numerical precision errors in comparing the finite difference vs. analytic gradient. For example, the fdcheck shows plenty of errors on the order of 1e-5 for the IC gradient, but I don't think that is a problem; on the other hand, I believe the appearance of negative eigenvalues in the IC Hessian is a real problem.
I think to find the bug, we need to first determine if there is a problem in the InternalCoordinates.calcHess()
function (https://github.com/leeping/geomeTRIC/blob/master/geometric/internal.py#L1986). If that is indeed the source of the problem, we can fix the bug and get a better agreement between "the IC Hessian converted from Cartesians" and "the IC Hessian computed by finite difference". Next we can create a unit test to ensure it always produces the correct result.
One problem that complicates the standard unit test approach is that there are always numerical precision errors in comparing the finite difference vs. analytic gradient. For example, the fdcheck shows plenty of errors on the order of 1e-5 for the IC gradient, but I don't think that is a problem
FWIW that's what numpy.testing.assert_array_almost_equal
is designed for. By default it compares arrays with precision 1e-6, but it can be configured.
Okay, I see what you mean. I think the test of the IC Hessian should pass if the "absolute differences" in the final column (below) are less than 1e-4, maybe 1e-3 (I'm not sure on the absolute threshold to use). The differences are certainly a lot bigger right now.
-=# Now checking Hessian of the energy in internal coordinates using finite difference on gradient #=-
IC1 Name IC2 Name : Analytic Numerical Abs-Diff
DLC 1 DLC 1 : 0.082989 0.163078 -0.080088
DLC 1 DLC 2 : -0.049717 -0.041980 -0.007737
DLC 1 DLC 3 : 0.022504 0.007129 0.015374
DLC 1 DLC 4 : 0.017262 0.013478 0.003784
DLC 1 DLC 5 : -0.024098 -0.020622 -0.003476
DLC 1 DLC 6 : 0.043281 0.039945 0.003335
DLC 1 DLC 7 : 0.012610 0.010526 0.002084
DLC 1 DLC 8 : 0.015706 0.014942 0.000764
DLC 1 DLC 9 : -0.008275 -0.009837 0.001561
Results of fdcheck at minimum point: fdcheck_minimum.tar.gz
Thanks a lot. There is still a significant error. I will spend some time on reproducing the issue here and attempt a fix.
I've gotten your example to run on my server. I'd like to know whether it's possible to increase the accuracy of the gradient ORCA further, in order to eliminate other possible issues. The output below indicates a gradient error on the order of 1e-4, and I'd like to make it one order of magnitude lower. Could you suggest which parameters I should adjust? (I'm not an expert ORCA user.)
-=# Now checking gradient of the energy in internal coordinates vs. finite difference #=-
IC Name : Analytic Numerical Abs-Diff
DLC 1 : 0.000020 -0.000093 0.000112
I'd like to know whether it's possible to increase the accuracy of the gradient ORCA further, in order to eliminate other possible issues.
It's possible to disable RIJCOSX by replacing defgrid3 nofinalgridx
with NORI NOCOSX
(letter case doesn't matter btw). However, it will behave like a different theory level, i.e. slightly different energy, slightly different minimum point, and so on.
Are you using ORCA 5.0.4? Manual says about version 5 that
The COSX gradient has been strongly improved as well, resulting in an algorithm that is significantly faster and numerically more precise
BTW, it would be great if geomeTRIC could work correctly with lower quality gradients. There are other QC methods which can introduce gradient noise, e.g. solvation models, and there are methods which don't have analytical gradients at all. I think it could make sense to add debugging option to geomeTRIC that would add pseudorandom noise vector of chosen norm to gradient, and use it to ensure that it still can converge reasonably.
Thanks for the tip! I'm using ORCA 5.0.1 here, the version is older than 5.0.4 because I installed it a while ago.
I agree it would be great if geomeTRIC could converge reliably with low quality gradients but I am not aware of any software or code designed to do that. The underlying assumption of any gradient-based optimization code (that I know about) is that the gradient is "correct". There might be examples of optimization methods that use noisy gradients in the literature, if you or anyone could suggest some, I can look into it. It would be a major feature though.
I'm using ORCA 5.0.1 here, the version is older than 5.0.4 because I installed it a while ago.
I guess it should be fine. IIRC the only major thing fixed recently was bug in D4, which is not used here.
I agree it would be great if geomeTRIC could converge reliably with low quality gradients but I am not aware of any software or code designed to do that.
However it seems like ORCA can cope well with RIJCOSX gradients (at least for small molecules like this one). Maybe it's because of better Hessian update, or maybe because of some fine tuning of RFO parameters.
Those are good points. If there is a bug in the Hessian conversion or the Hessian update, it will cause convergence issues whether the gradient is exact or not, so the current priority is to find whether there's a bug and fix it. Tuning parameters could also improve performance, but one needs to tune them in a way that improves performance for the currently-bad cases like this one without worsening performance for other cases where it's working well. We can revisit that later.
In the meantime, I got this result when comparing Hessian eigenvalues. Somehow, there isn't a significant difference in the smallest eigenvalues on my end. It could be because I was running on a different branch (interpolate). I will switch back to the master branch and try again. Some selected values are reproduced below.
Hessian Eigenvalues (Cartesian):
-0.00009 -0.00005 -0.00000 0.00001 0.00008 0.00010 0.00017 0.00021 0.00034
Hessian Eigenvalues (Internal):
0.00057 0.00282 0.00378 0.00400 0.00406 0.00428 0.00482 0.00490 0.00563
Hessian Eigenvalues (Hq_f):
-0.00033 0.00102 0.00241 0.00289 0.00331 0.00364 0.00434 0.00492 0.00566
Here are two minimizations, the former performed with built-in optimizer of ORCA 5.0.4 (redundant internal coordinates, RFO) and the latter by geomeTRIC (unmodified master +
0001-Okay-patch.patch.txt). This is the same system as in #180, but high-quality Hessian (B3LYP/6-31G**) was used from start to the end to aid convergence.
defgrid3 nofinalgridx
was used in both cases to improve quality of gradients as compared to default settings.In case of geomeTRIC I've used
--reset=False
option, otherwise high-quality Hessian would be dropped right in the beginning because of negative eigenvalues created by Hessian update. I've also used--subfrctor 2
to avoid whole system rotations with intent to keep Hessian more relevant for the current geometry.ORCA's inputs and outputs: PMDA_chair_dimer2_vdzp_orca.tar.gz
geomeTRIC's
command.sh
, inputs and outputs: PMDA_chair_dimer2_geometric.tar.gz