An important consideration for PDE-constrained optimisation methods is "mesh dependence".
The standard approach in firedrake_adjoint is to extract the vector data that underlies the control field and to apply the SciPy implementation of the optimisation method (e.g. BFGS), with inner products and norms computed in the $\ell^2$ sense. The problem with this approach is that the convergence of the PDE-constrained optimisation method can be highly dependent on the mesh used to represent the control field. This is not a problem if the control is in R-space, but it is for spatially varying controls. One can imagine that this is even more problematic if the mesh adapts during the optimisation.
An approach to addressing the mesh dependence problem is to write the optimisation routines using a more suitable norm, which is related to the finite element function, rather than its underlying vector data. The $L^2$ seems like a reasonable default choice to start off with. This has already been done for gradient descent, but we should do the same for the other methods, too.
Note that this is (probably) why the convergence of gradient descent seems to be unexpectedly better than the second order methods in some recent results.
An important consideration for PDE-constrained optimisation methods is "mesh dependence".
The standard approach in
firedrake_adjoint
is to extract the vector data that underlies the control field and to apply the SciPy implementation of the optimisation method (e.g. BFGS), with inner products and norms computed in the $\ell^2$ sense. The problem with this approach is that the convergence of the PDE-constrained optimisation method can be highly dependent on the mesh used to represent the control field. This is not a problem if the control is in R-space, but it is for spatially varying controls. One can imagine that this is even more problematic if the mesh adapts during the optimisation.An approach to addressing the mesh dependence problem is to write the optimisation routines using a more suitable norm, which is related to the finite element function, rather than its underlying vector data. The $L^2$ seems like a reasonable default choice to start off with. This has already been done for gradient descent, but we should do the same for the other methods, too.
See Mesh dependence in PDE-constrained optimisation, chapter 2 for detailed discussion and how to implement some of the methods.