zadorlab / sella

A Python software package for saddle point optimization and minimization of atomic systems.
https://www.ecc-project.org/
Other
72 stars 21 forks source link

Inconsistency in Eq(22) between Paper and Bofill's 2003 TS-BFGS Correction #52

Closed yinkaaiwu closed 1 week ago

yinkaaiwu commented 1 week ago

Description:

In your paper titled "Accelerated Saddle Point Refinement through Full Exploitation of Partial Hessian Diagonalization," the calculation for $\mathbf{M}_k$ in Equation 22 is provided as: $\mathbf{M}_k=\mathbf{y}_k\mathbf{y}_k^T+|\mathbf{B}_k| \mathbf{s}_k\mathbf{s}_k^T |\mathbf{B}_k| \qquad$ eq(1) However, in Bofill's 2003 paper on the correction of TS-BFGS, the formula for is presented $\mathbf{M}_k \tag{2}$ as: $\mathbf{M}_k=\mathbf{y}_k\mathbf{y}_k^T+(\mathbf{s}_k^T |\mathbf{B}_k| \mathbf{s}_k) |\mathbf{B}_k| \qquad$ eq(2)

These two formulations are not mathematically equivalent. Despite the fact that eq(1) still appears to be effective in practice, I am curious about the rationale behind this modification.

Questions:

Rationale for Modification: Could you kindly explain the reasoning behind the choice to use eq(1) instead of the formulation proposed by Bofill in 2003?

Practical Impact: Have you observed any significant differences in performance or convergence between the two formulations in your experiments?

Theoretical Justification: Is there any theoretical justification or additional context that supports the use of the modified formula in your paper?

Additional Context:

Paper Reference: Accelerated Saddle Point Refinement through Full Exploitation of Partial Hessian Diagonalization.

Bofill's Paper: Remarks on the updated Hessian matrix methods.

Thank you for your attention to this matter. I look forward to your insights and clarification.

ehermes commented 1 week ago

These are equivalent, because M is only used to construct u, in which all instances of the matrix are right-multiplied by s, and the two distinct expressions you gave for M result in the same vector value for Ms.

yinkaaiwu commented 1 week ago

These are equivalent, because M is only used to construct u, in which all instances of the matrix are right-multiplied by s, and the two distinct expressions you gave for M result in the same vector value for Ms.这些是等价的,因为 M 仅用于构造 u,其中矩阵的所有实例都通过 s 进行右乘,而你给出的 M 的两个不同表达式导致 Ms 具有相同的向量值。

oh, you're right, how silly i am, i forget Ms. Thank you so lot.