Global convergence and convergence rate

\paragraph{Global convergence}

Most of the convergence results in terms of first- or second-order criticality rely on some sufficient decrease conditions on the iterates, such as the Cauchy decrease condition, the Armijo-Goldstein-Wolfe condition, or their analogues. It has been proven, for sufficiently smooth functions, that direct-search algorithms achieves $$\liminf_{k \to \infty} \norm{\nabla \obj(\itr)} = 0,$$ if the step size fulfills a sufficient decrease measured by a forcing function~\cite{Kolda_Lewis_Torczon_2003,Torczon1997}, where~$\set{\itr}{k = 0}^{\infty}$ denotes the iterates. General convergence results of the direct-search methods are summarized in~\cite{Kolda_Lewis_Torczon_2003}. Trust-region methods in the gradient-based context have also been proven convergent under the Cauchy decrease condition, in the first place by Powell~\cite{Powell_1970a,Powell_1970b,Powell_1975b}. These results have later been extended to derivative-free trust-region algorithms, and convergence for unconstrained optimization has been established for both first- and second-order criticality~\cite{Conn_Scheinberg_Toint_1997a,Conn_Scheinberg_Vicente_2009a}. A crucial point of the trust-region method, particularly in the derivative-free context, is the management of the models, which should satisfy some accuracy conditions to ensure convergence. It is sufficient to assume that the models are fully-linear or fully-quadratic during most iterations to ensure convergence for first- or second-order criticality respectively. Such conditions can be guaranteed for models obtained by polynomial interpolations and regressions~\cite{Conn_Scheinberg_Vicente_2008a,Conn_Scheinberg_Vicente_2008b}.

\paragraph{Convergence rates}

It is established that the first-order stationarity condition~$\norm{\nabla \obj(\itr)} \le \epsilon$ can be achieved with a worst-case complexity of~$\mathcal{O}(\epsilon^{-2})$ in terms of function evaluations for both direct-search methods~\cite{Vicente_2013} and derivative-free trust-region methods~\cite{Garmanjani_Judice_Vicente_2016}. This bound can be improved to~$\mathcal{O}(\epsilon^{-1})$ under the presence of convexity and to~$\mathcal{O}(-\log \epsilon)$ under the presence of strong convexity for direct-search methods~\cite{Dodangeh_Vicente_2016}. Trust-region methods in the gradient-based context also enjoy such convergence rates~\cite{Grapiglia_Yuan_Yuan_2016}, and~\citeauthor{Garmanjani_Judice_Vicente_2016} speculate that the convex case can be extended to the derivative-free trust-region algorithms~\cite{Garmanjani_Judice_Vicente_2016}. It is also proven that the second order accuracy~$\max \set{\norm{\nabla \obj(\itr)}, -\lambda_k} \le \epsilon$ where~$\lambda_k$ refers to the least eigenvalue of~$\nabla^2 f(\itr)$ can be achieved with a worst-case complexity bounds of~$\mathcal{O}(\epsilon^{-3})$ for derivative-free trust-region methods~\cite{Gratton_Royer_Vicente_2020}. The stochastic counterparts of direct-search methods~\cite{Gratton_Etal_2015} and trust-region methods~\cite{Bandeira_Scheinberg_Vicente_2014} enjoy similar worst-case complexities under some martingale-like conditions. For trust-region methods, it implies that the accuracies of the models should be guaranteed with a certain probability conditioning on the past~\cite{Cartis_Scheinberg_2018,Gratton_Etal_2018}. An important remark is that these studies depict the global convergence properties of the methods, not their local behaviors. Global performances are important in the \gls{dfo} context because most algorithms have to be prematurely stopped in practice due to a limited computational budget, before reaching a region where the local convergence rates can be beheld.

pdfo / paper

Global convergence and convergence rate #2