hiddenSymmetries / simsopt

Simons Stellarator Optimizer Code
https://simsopt.readthedocs.io
MIT License
83 stars 43 forks source link

`sync()` method for optimizables to be called before jacobian evaluation #400

Open smiet opened 3 months ago

smiet commented 3 months ago

I am running massively parallel simsopt optimizations of SPEC equilibria, and I am encountering the following problem:

Especially for sensitive target functions, such as island Residue, a minute difference in stopping point leads to differences in island width. Especially if the finite difference step is very small (defalut 1e-7), the stopping condition tolerance needs to be unfeasibly small to yield reasonable finite differences after the first jacobian evaluation.

It helps Setting fd step large (~1e-4), such that differences in objective are larger and fd jacobian is still reasonable.

What would be even better: add a Optimizable.sync() function that the is called on every Optimizable member of the inheritance tree of the prob before jacobian valuation. Default does nothing, but the user can overload this with their custom script to syncronize all workers with the leader for much better Jacobian evaluation. In SPEC this would be mpi.comm calls to synchronize the initial geometry.

This could be useful in other optimizables with stopping conditions that are run with MPI finite differencing.

Opening this issue for discussion on usefulness and method of implementation. Do you think it would work to add

class Optimizable:
...
  def sync(self): 
    for p in self.parents:
      p.sync
    self.sync_fn()

  def sync_fn(self):
    pass

and call the jacobian with an adapted wrapper (as @mishapadidar implemented for constrained_mpi_solve). [from line 201 in least_squares_mpi_solve.py function least_squares_mpi_solve]:

            def obj_jac(x):
                # dummy wrapper for batch finite difference
                fd.opt.sync()
                return fd.jac(x)
            if mpi.proc0_world:
                # proc0_world does this block, running the optimization.
                x0 = np.copy(prob.x)
                logger.info("Using finite difference method implemented in "
                            "SIMSOPT for evaluating gradient")
                result = least_squares(_f_proc0, x0, jac=fd.jac, verbose=2,
                                       **kwargs)

Is this the best implementation? Is such a feature wanted by others, or am I wrong in wanting such a thing? Are there any other features that such a function should allow for before I start implementing?

@landreman @mbkumar what do you think?