Choosing between scipy and cvxopt solvers

When running the version of polytope in branch scipysolver (as of 524f4c59782ccc31ab776ddf605ead4cb3c6de5e) with both scipy and cvxopt installed, it may be beneficial to have the ability to easily choose which solver is to be used. Below is a possible implementation of this to go in polytope.py:

def set_solver(solver):
    if solver == 'scipy':
        from scipy import optimize
        global optimize
        global lp_solver
        lp_solver = 'scipy'
    if solver == 'glpk':
        try:
            from cvxopt import matrix, solvers
            import cvxopt.glpk
            from .esp import esp
            global matrix
            global solvers
            global esp
            global lp_solver
            lp_solver = 'glpk'
            # Hide optimizer output
            solvers.options['show_progress'] = False
            solvers.options['LPX_K_MSGLEV'] = 0
            solvers.options['msg_lev'] = 'GLP_MSG_OFF'
        except:
            raise Exception("unable to import cvxopt.glpk")

Changing a global variable (lp_solver) may not be the best solution to switch between solvers but, given the current implementation, it is a relatively simple one. Are there any other thoughts on a safer or better way to implement this?

A simple and flat (PEP 20) approach is to pass the solver as a keyword argument to the function called. Setting globals is not advisable. Also, I do not know what could happen if one runs two threads with separate solver settings. With the keyword argument approach there is no danger in that case.

The only “headache” is that the solver settings have to descend throughout the call graph. However, using global variables to establish a solution environment is not an alternative. The polytope package needs a redesign (and re-implementation at some places), so this wouldn't be so big a problem. A redesign (but not re-implementation) has occurred on branch overhaul, but is unmerged because it was quickly applied and adversely affected performance, so it needs tracing when that happened in history.

Regarding passing around a solver environment, I recommend creating a data structure (using a dict is the simplest solution, in absence of arguments for something more) of solver settings (which solver, tolerance, etc. in the future). This is trivially extensible for future adaptation, without the need to add keyword arguments to all functions every time some variation is needed. It would replace (and contain) the argument abs_tol that currently propagates throughout the call graph.

A side comment is that global should always be avoided. In the fragment above, it appears that from foo import hehe as _hehe and settings['lp_solver'] = 'glpk' would result in the same functionality (though I need to consider this in more detail in context within polytope.polytope).

The difficulty in these cases arises because a choice of solver at a global scope constitutes module state. This is implicit. However, if the solver is abstracted by a function within the module, then this state is restricted to the function. Fans of object orientation might want to create a class in this case, but that would only nest things and not simplify anything.

The explicit approach is to pass solver options as described above and have them propagate through the call graph. This happens everywhere in polytope though, it can be impractical and reduces readability of the entire module.

So, a configuration function as the OP proposes appears to be a good abstraction that offers a compromise between practicality and explicitness (not necessarily with the function body proposed in the OP). The closest thing that comes to mind is matplotlib.use. Given that this approach is used in matplotlib, it suggests that it has been found practical. This settles whether a function should be used.

Sidenote: there do exist objects passed around anyway: the polytopes. So, the state "solver choice" could be stored in them. However, then we can have issues with mismatch of solver choices in two different polytopes passed as arguments to the same function, so this approach would create more things for the user to worry about.

There is something that bothers me, and is orthogonal. Implicitly, we change solver from scipy to cvxopt.glpk if the latter is found. This does not fix a default, and I fear it will cause problems:

in tulip, there is a fixed default for the solver (omega) that is always available, and changes should happen explicitly.
in dd, you choose which module to import, based on what you know you have installed on your machine. The modules have the same API.
in tulip.interfaces.omega initially I wrote it so that it tries to use dd.cudd, and if that is absent, it uses dd.bdd. Later, errors went unnoticed because of this implicit switch (I think they escaped testing on my machine, and were uncovered on Travis CI), so I fixed the default to dd.bdd, and offered dd.cudd as an option through a function argument.
in matplotlib, there is a default backend (from the errors that I see I think that it usually is Qt, unless changed in the user configuration file).
cvxopt makes this choice explicit, by the module that you choose to import.

Based on these, I think that it is time to fix scipy as the default backend, and offer a mechanism as above for explicitly changing the configuration. This will affect tulip.

The simplest abstraction would probably be to confine the option setting to be shallow. What I mean is this: if you want to change the backend of matplotlib, you do not import package some.nice.plotting.this.is.a.module.nested.really.deep and select the backend there. You import matplotlib solely for the purpose of setting matplotlib.use('Agg') or something similar. So, we shouldn't nest things, by making this a choice within tulip. Probably the best would be to let tulip be oblivious of this choice (for calls that involve polytope), and have the user define it directly for polytope.

Of course, this leaves out quadratic programming in tulip.abstract (which would need its own explicit configuration -- though there is only a single choice currently: cvxopt, so not much of an option), but, strictly speaking, that is a different solver, somewhere else, solving a different problem (QPs instead of LPs).

One advantage is that this will show up in the tulip examples, thus quickly making users aware of two things:

that there is a choice
how that choice can be changed

The choice of GR(1) solver is already explicit. Note that the state of a "problem instance" is a very useful notion in this case, because it is the only place where these choices fit.

The choice of BDD solver, where there are alternatives, is currently explicit. However, it is nested, because it goes through tulip.interfaces.omega, instead of being selected directly as the backend of dd (ala matplotlib). So, perhaps we should consider changing that part of the interface too.

This issue concerns two aspects:

local choice of solver (when calling the function lpsolve)
default choice of solver (global)

The default constitutes global state, but a reasonable default helps users. Changing the default is possible via the variable that defines it (more details below).

A way to change the solver used in the entire module is needed, because otherwise this choice would have to propagate down the entire call graph, which would clutter the source code (see also comment above). A global variable is present for this purpose anyway, in order to record what happened at import time. Reusing this variable is good enough. Simple is better than complex [PEP 20].

Default choice

The module's default is decided automatically at import time. The best installed solver is selected. This is reasonable, because it frees user code of unnecessary boilerplate. I reject my suggestion above to fix the default solver to scipy (which is slower).

Changing the default choice at runtime

Python tries to avoid setter functions whenever possible. The OP is implemented by letting the user changing the value of the module variable lp_solver. If only scipy is installed there is no other choice. If the user changes lp_solver, they will just cause an exception later. If both solvers are installed, then lp_solver = 'glpk' by default. If the user wishes to use scipy instead, they can do:

from polytope import polytope as _pt

_pt.lp_solver = 'scipy'

This possibility has been documented in the docstring of the module polytope.polytope, and a hint has been inserted also in the docstring of the function polytope.polytope.lpsolve (5dc5be67f4905c55f513532f00b1b280ce3c21aa).

The choice of solver is by name, not by True/False, so that more than 2 solvers be selectable (in the future), and for readability of user code. So the same amount of information is needed to set the variable lp_solver as would be needed to pass when calling a setter function. So there is no more cognitive burden placed on the user.

Explicitly selecting a solver locally

For calls to the solver interface polytope.polytope.lpsolve, the keyword argument solver provides an explicit choice that overrides the module's default (b9af8086b0a0324f22f93e3c84e5041e8ea7c831).

Other issues

The QP solver choice within tulip (mentioned above) is a separate concern than what is examined here. There is but only one choice there for now. So there is nothing more to be done about that here.

Regarding the default choice, I commented above:

A global variable is present for this purpose anyway, in order to record what happened at import time. Reusing this variable is good enough.

This comment refers to the variable lp_solver:

https://github.com/tulip-control/polytope/blob/f4c0cb4ce7bf60ad32d90e34770d4cefc2ce7af0/polytope/polytope.py#L87

Changing this variable does change the default solver. However, the following corner case raises an error:

install GLPK and cvxopt.glpk
import polytope, so lp_solver == 'glpk'
set the module variable lp_solver = 'scipy'
call polytope.lpsolve(..., solver='glpk')

The error will be raised because the module now thinks that the best available solver is scipy, and that GLPK is unavailable, here:

https://github.com/tulip-control/polytope/blob/f4c0cb4ce7bf60ad32d90e34770d4cefc2ce7af0/polytope/polytope.py#L2293

So changing lp_solver does change the package's default, but actually prevents direct calls to GLPK via lpsolve. There is no reason for this limitation. To avoid it, two separate module variables should be used:

a hidden set-valued variable that records all solvers found installed at import time
a visible variable that records the current default choice (e.g., lp_solver)

Setting the default can remain as simple as it is now, with 2 choices, even when more solvers become available:

if scipy is present but GLPK absent, choose scipy as default
if scipy and GLPK are present, choose GLPK as default

About switching from a pure Python to an accelerated implementation (so a specific case of choosing among implementations of the same API), as of Python 3, CPython automatically uses the accelerated one when available, and hides (_*) the accelerated one. We, too, arrived at this approach:

https://github.com/tulip-control/polytope/blob/367d458078514615d7775c9f5442bcad481d078b/polytope/solvers.py#L54-L59

This discussion cites this relevant excerpt:

A common pattern in Python 2.x is to have one version of a module implemented in pure Python, with an optional accelerated version implemented as a C extension; for example, pickle and cPickle. This places the burden of importing the accelerated version and falling back on the pure Python version on each user of these modules. In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions.

So it may be worth considering using this approach also at other places where the API allows doing so.

This comment is relevant to discussion above.

tulip-control / polytope