sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.43k stars 479 forks source link

make MIP backend interface more Python-ic #10341

Closed malb closed 13 years ago

malb commented 13 years ago

Sage 4.6.1 will contain a new set of backend classes for mixed integer programming, which will make it easier to write interfaces for other solvers. There has been some off-list discussion about this interface and the follow changes were agreed upon:

The patches are to be applied in this order:

CC: @nathanncohen

Component: linear programming

Author: Martin Albrecht, Nathann Cohen

Reviewer: Martin Albrecht, Nathann Cohen

Merged: sage-4.6.2.alpha3

Issue created by migration from https://trac.sagemath.org/ticket/10341

malb commented 13 years ago
comment:1

Attachment: mip_interface_changes.patch.gz

The attached patch makes the necessary changes to the GenericBackend and the GLPK class. Coin and CPLEX are not done yet.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:2

And here are the modifications to CPLEX and Coin. The doctests pass, plus all the methods of Graph/Digraph/GenericGraph pass with both GLPK, Coin and CPLEX.

I agreed with you patch almost everywhere ! The only modification I made is the "constraints" argument you had placed in the add_constraint method. I really didn't get why you picked this name. I made it "coefficients", as the pairs are actually the coefficient of the sparse matrix, but of course we can discuss it during the review :-)

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago

Attachment: trac_10431-part2.patch.gz

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago

Changed author from Martin Albrecht to Martin Albrecht, Nathann Cohen

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago

Description changed:

--- 
+++ 
@@ -5,3 +5,8 @@
 * change :func:`add_linear_constraint` to accept any iterable of the form`` [(c,v) ...]``
 * ``min`` and ``max`` should be lower bound (or ``lb``) and upper bound (or ``ub``) to conform to MIP conventions
 * allow parameters in :func:`add_variable`
+
+The patches are to be applied in this order:
+
+* mip_interface_changes.patch
+* trac_10431-part2.patch
malb commented 13 years ago
comment:4

Sure, your naming makes much more sense. Are you reviewing mine first, or is your patch your review? I can review your's then?

malb commented 13 years ago
comment:5

I noticed two more interface "issues" we may decide to "fix":

Neither of those changes I view as mandatory, just throwing the idea out there to see what others think.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:6

Sure, your naming makes much more sense. Are you reviewing mine first, or is your patch your review?

Well, I checked yours while working on it and fixed this "constraints->coefficients" inside of my patch ... If you review mine, both files will have be read by two sets of eyes, so I guess it would count as a valid review :-)

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:7

Replying to @malb:

I noticed two more interface "issues" we may decide to "fix":

  • p.row() returns two lists (indices,coeffs) right now, we could change it to return [(v1,c1),...,(vn,cn)].
  • We could allow an optional obj parameter which contains the coefficient of the new variable in the objective function?

Neither of those changes I view as mandatory, just throwing the idea out there to see what others think.

Though they sound right... :-)

Nathann

malb commented 13 years ago
comment:8

One more thing: you assume in your interface that one can change column and row names after those have been created. However, this isn't easy in SCIP (I could possibly hack around it by hacking around the API). Can we change that, e.g. by adding name as an optional parameter to add_variable() and add_linear_constraint()?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:9

Well, if that's how SCIP's interface is written, I see no way around that .... :-/

I just hope all these optional parameters won't hurt the interface's efficiency. I am eager to have a working stable version of it in Sage, that would let me rebase other patches that have been made invalid because of the work we are doing here. I may spend some time trying to do some performance analysis on the interface afterwards :-D

I'm already sick of these names... They are totally useless for all practical purposes, their only aim is to produce different results in p.show() (which doesn't even use them for the moment) or when writing the LP to files, which is totally orthogonal to actually solving them. Besides, each solver has a different way of managing them, especially Coin. I spent nights just fighting with it, for nothing in the end (oh yes.. it compiles, and there was no regression since the first version of LP.. Who the hell cares ?)

Just to know : are you yourself using names for variables and constraints ?

Nathann

P.S. : Oh, yes.. Let's add this optional argument if there is no way around. If at some point we find a trick, it won't be as hard to remove an optional argument from an interface as any other, as most of the method calls don't include it.

So in the end

malb commented 13 years ago
comment:10

I usually set them (e.g. when converting from polynomial systems), but I guess I'm not really using them usually.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:11

So what are they useful for in SCIP ? Can it produce outputs at .lp or .mpw files ?

Nathann

malb commented 13 years ago
comment:12

Yep, and other formats. You can also query a variable (which is a C struct) for its name in your code. For example, you can search for variables by name.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:13

Replying to @malb:

Yep, and other formats.

Nice !!

You can also query a variable (which is a C struct) for its name in your code. For example, you can search for variables by name.

Same in GLPK. For the moment, I do not do any work on the matrix once it has been defined. Or rather, I only add more information, but never remove/read anything from it O_o

Nathann

malb commented 13 years ago
comment:14

Okay, what remains to be done then:

Anything I missed?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:15

Oh... Hem.. Actually, if we are to actually drop these methods... -_-

I can not stand these names... Here is the problem : right now, there are no names when you add variables or constraints in MixedIntegerLinearProgram. When p.show() is called, the names are filled with something which makes them at least distinct and recognizable... But if we want to drop these methods, then p.show() has to be rewritten too.. -_-

Nathann

malb commented 13 years ago
comment:16

I would suggest an optional parameter "name=None" for add_variable()?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:17

Replying to @malb:

I would suggest an optional parameter "name=None" for add_variable()?

If you need it for SCIP... :-/

malb commented 13 years ago

Attachment: trac_10431-part3.patch.gz

malb commented 13 years ago

Description changed:

--- 
+++ 
@@ -10,3 +10,4 @@

 * mip_interface_changes.patch
 * trac_10431-part2.patch
+* trac_10431-part3.patch
malb commented 13 years ago
comment:19

I did the changes below for the GenericBackend and GLPK. Nathann, can you do Coin and CPLEX? I'll update my SCIP interface to see whether now I can pass all doctests with it?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:20

Hello !!!

Martin, the point is that I do not know how to get rid of setting names through col_name and row_name for the moment ! It would mean show() would have to be rewritten, and I do not know how to do it nicely ! :-/

If you have any idea...

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:21

My best bet would be to keep the name argument in row/col_name, and have SCIP ignore it when it is given or raise an exception. It is already the case for Coin which does not accept names for everything. We could then define a name when the variable/constraint is added (which would work for SCIP), or later on (which wouldn't for SCIP)..

Then again, dropping everything name-related may be the best idea in the end. It's totally useless, it adds arguments/methods everywhere, and no two solvers handle it the same way. What do you think of it ?

Nathann

malb commented 13 years ago
comment:22

Did you check my patch? For some reason p.show() still works, or is the doctest not enough to ensure this?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:23

Replying to @malb:

Did you check my patch? For some reason p.show() still works, or is the doctest not enough to ensure this?

Well, if your doctests pass then I guess the docstring is not enough. Anyway, for the moment the variable's names are just "cached" inside of MixedIntegerLinearProgram, because I thought stupid to call routines to updates the names unless someone wants to show() it or to export it to a file (which one never does when actually trying to solve them). There is a method named update_variable_name, which you commented in your patch, which does just that : before print show() or exporting the result, it updates the variable's names. This is the only way for the variables to have a name, and as it is usually done after the LP has been generated, it has to use the set_col_name commands. I remember having written somewhere that it was broken at some point, but I am never too eager to deal with this name mess.

Oh, of course everything works fine if you comment #update_variable_names. The .show() method will just print x_1, x_2, etc and the .lp and .mps files will have the same kind of names. I do not think it is that awful, and removing all this name-related stuff from the class will just make it easier to work with.

I mean, it would have been nice to be able to deal with names, but :

It looks like a lot of work for something we're not even sure we can use...

Nathann

malb commented 13 years ago
comment:24

So what about:

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:25

Replying to @malb:

So what about:

  • If the user wants to use names then (s)he has to provide them at creation time of variables or constraints.
  • Solvers which don't support them, simply ignore them.
  • In show, if there is a custom name set (queried by backend.col_name() and backend.row_name()) then it is used for printing
  • When writing lp we do the same?

.....

Sounds logic enough :-p

It just exposes I had a stupid idea when I wrote this set_row_name in the first place. And I guess it's only fitting that I clean my own mess. Please comment the code from mip.py you need to make your patch run and pass doctests, and I'll change what is needed there when dealing with Coin and CPLEX !

Nathann

malb commented 13 years ago
comment:26

part3 passes doctests for me. However, so far it simply ignores names.

malb commented 13 years ago
comment:27

PS: Can you give me some examples which take about 1 minute to compute with GLPK from the graph module? I want to test the performance of the SCIP solver for those.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:28

Oh ! To compare performances, you could just try a sage -t on graph.py digraph.py and generic_graph.py.

I guess the performances would then depend on the kind of problem : all the solvers I tried until now are just awful for the minor method :

sage: %time graphs.Grid2dGraph(5,5).minor(graphs.CompleteGraph(4), solver = "GLPK")
CPU times: user 10.59 s, sys: 0.01 s, total: 10.60 s
Wall time: 10.62 s
{0: [(1, 3), (1, 2), (3, 3), (4, 4), (1, 4), (2, 3), (3, 4)], 1: [(2, 1), (2, 2), (3, 2), (4, 2), (2, 0)], 2: [(0, 0), (1, 0), (0, 1), (0, 2)], 3: [(1, 1)]}

The methods raises an exception when 4 becomes 5, which is natural as the problem has no solution, but it takes one lifetime to get this answer from GLPK. If SCIP can get it quickly, that's just a miracle :-D

Another problem of the same kind :

n = 7
graphs.Grid2dGraph(n,n).disjoint_routed_paths([((0,0), (n-1,n-1)), ((0, n-1), (n-1, 0))])

This already takes 10 seconds with CPLEX. Try n=8 if you are patient !

The answer is also an infeasibility, and the solvers are having a hard time proving it.. But I guess it just comes from my bad LP formulation ^^;

A different type of LP formulation, on random graphs :

from sage.graphs.graph_coloring import edge_coloring
%timeit edge_coloring(graphs.RandomGNP(30,.3), solver = "GLPK", verbose = 2)
5 loops, best of 3: 4.47 s per loop

Nathann

malb commented 13 years ago
comment:29

On my machine:

sage: %time graphs.Grid2dGraph(5,5).minor(graphs.CompleteGraph(4), solver = "GLPK", verbose=0)
CPU times: user 2.81 s, sys: 0.00 s, total: 2.81 s
Wall time: 2.82 s
{0: [(1, 3), (1, 2), (1, 1), (2, 3), (2, 4)], 1: [(3, 0), (1, 0), (3, 1), (2, 0)], 2: [(2, 1)], 3: [(4, 4), (2, 2), (3, 2), (4, 2), (4, 3), (3, 4)]}

It seems there's still a bug in my SCIP interface:

sage: %time graphs.Grid2dGraph(5,5).minor(graphs.CompleteGraph(4), solver = "SCIP", verbose=0)
CPU times: user 1.88 s, sys: 0.04 s, total: 1.92 s
Wall time: 1.94 s
{0: [(0, 0), (0, 1)], 1: [(1, 1)], 2: [(2, 1), (2, 2), (1, 0), (2, 0)], 3: [(1, 2), (0, 2)]}
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:30

Replying to @malb:

It seems there's still a bug in my SCIP interface:

Why so ? The answer is not unique ! :-)

Nathann

malb commented 13 years ago
comment:31

That's a critical piece of information I missed :)

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:32

Plus CPLEX, plus Coin ! :-)

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago

Description changed:

--- 
+++ 
@@ -11,3 +11,4 @@
 * mip_interface_changes.patch
 * trac_10431-part2.patch
 * trac_10431-part3.patch
+* trac_10431-part4.patch
malb commented 13 years ago
comment:34

The patch looks good, I'm running long doctests on sage.math now. Can I assume you ran doctests for CPLEX and Coin (i.e. with the optional packages installed)? I think the only remaining issue is:

sage: p = MixedIntegerLinearProgram()
sage: p.new_variable(name="foo")                                                                                                                                                                               
MIPVariable of dimension 1.                                                                                                                                                                                    
sage: x = _
sage: p.add_constraint(0 <= x[1] <= 2)
sage: p.show()
Maximization:

Constraints:
  x_0 <= 2.0
Variables:
  x_0 is a real variable (min=0.0, max=+oo)

For some reason MIPVariable ignores the optional name parameter and hardcodes "x". Is there a reason for this?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:35

Is there a reason for this?

Not really. Technically, you will find this 'x' inside of the __repr__ method in LinearFunction, which does not care at all whether the variable has a name or not (it does not even know the MIPVariable object itself, just its id in the LP, and the MIPVariable itself does not know its name (only the LP does). All of is has to be rewritten, it is ugly and inefficient code, but would like to find a way to deal with this that does not make every LP waste time on totally useless name-related operations.

I still think we should just remove anything in these classes which is name-related for the same eternal reasons. The code is ugly, hard to understand, it slows the class down, and it is totally useless for solving LP anyway :-/

Nathann

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:36

Oh, and I of course checked all the patches on CPLEX and Coin each time. All tests do not pass, because of old bugs :

In the end, the remaining bugs (and they have been there for quite a while, but I did not really worry....) are all about displaying stuff. One is about numerical noise on a very specific case (Coin + Flows. One also has to know that by default Sage uses a Cython implementation of the Ford-Fulkerson algorithm to compute a flow, and not the LP unless asked otherwise -- I made the docstring of Graph.flow mention this noise).

Short of this, all tests pass in graph.py, digraph.py and generic_graph.py. So all the methods there do not see the difference between GLPK, Coin or CPLEX (except for the time spent testing the docstrings) ;-)

Would it be possible to address them in a future ticket ? And could I please have your thoughts about removing all these names from the class ? :-)

Nathann

malb commented 13 years ago
comment:37

Would it be possible to address them in a future ticket ?

Agreed.

And could I please have your thoughts about removing all these names from the class ? :-)

I'm unsure about this. But it is certainly undesirable to allow "name" as a parameter in new_variable() and then to ignore it. My main motivation for keeping names around would be to make debugging easier: I can check whether my mixed integer program looks like I want it to look without getting confused about variable name matching.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:38

I'm unsure about this. But it is certainly undesirable to allow "name" as a parameter in new_variable() and then to ignore it. My main motivation for keeping names around would be to make debugging easier: I can check whether my mixed integer program looks like I want it to look without getting confused about variable name matching.

It can be very useful like that for sure.... But I really can not stand that the performances of a solver could somehow suffer because of something which is not used 99% of the time : when the LP are known to work. I really hope this could be fixed soon :-/

Nathann

malb commented 13 years ago
comment:39

How does performance suffer exactly? Names aren't used except for showing and writing lp files?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:40

names aren't used except in these cases, but when the mip.py file will have been rewritten any creation of variable will mean storing+concatenating names, any call to add_variables/constraints already creates variables that are to receive potential names. I still do not know the loss between calls of cdef methods and cpdef, and one of the reasons why we have so many optional arguments is to deal with names. And so much in the backends is about names (I can not stand to look at code when I know it is never used. I immediately want to remove it)

I mean, all the LP interface does is create constraints and variables. When you add name considerations to all of these things, you just add time to earn nothing. I will have fun profiling all this when it will be in to check what we could earn by removing them or writing it more smartly. Perhaps I'm just complaining for nothing as usual... Even though cutting the code by half seems tempting just for its sake :-D

Nathann

malb commented 13 years ago
comment:41

I don't care much about writing LP files but printing of MIPVariables (p.show()) should allow custom names. I'm not saying it should be supported in the backends but the frontend should. It's just the level of convenience users will expect.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:42

Here is another patch to fix the .show method and the names in general. Even the .lp files exported through Sage bear the good names !

I also fixed the CPLEX problem with infinite bounds, so that all test pass with CPLEX too !

Nathann

malb commented 13 years ago
comment:43

You accidentally hardcoded GLPK it seems in generic_backend.pyx?

  solver = "GLPK" 
6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:44

Gloops.... Thank you for spotting it, this would have been nasty ^^;

I use that line to test LP with GLPK even though I have CPLEX installed....

Patch updated !

Nathann

malb commented 13 years ago
comment:45

Do doctests sill pass with CPLEX?

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 13 years ago
comment:46

Of course of course !! I always do a last check with GLPK when I am done with the patch, as it is the "most important" solver for Sage :-)

Nathann

malb commented 13 years ago

Reviewer: Martin Albrecht, Nathann Cohen