sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.37k stars 462 forks source link

Fractional Chromatic Index test fails with GLPK #23798

Open jdemeyer opened 7 years ago

jdemeyer commented 7 years ago

The test

            sage: g = graphs.PetersenGraph()
            sage: g.fractional_chromatic_index(solver='GLPK')
            3.0

added in src/sage/graphs/graph.py by #23658 fails with GLPK-4.63 on 32-bit.

As a workaround, we use PPL by default in #24099.

CC: @dcoudert

Component: graph theory

Author: David Coudert

Branch/Commit: public/graphs/23798_fractional_chromatic_index @ 43e8873

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/23798

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -5,4 +5,4 @@
             sage: g.fractional_chromatic_index(solver='GLPK')
             3.0

-added by #23658 fails with GLPK on 32-bit. +added by #23658 fails with GLPK-4.63 on 32-bit.

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -5,4 +5,4 @@
             sage: g.fractional_chromatic_index(solver='GLPK')
             3.0

-added by #23658 fails with GLPK-4.63 on 32-bit. +added in src/sage/graphs/graph.py by #23658 fails with GLPK-4.63 on 32-bit.

dcoudert commented 7 years ago
comment:3

I suspect that we need to change if M.solve(log = verbose) <= 1: to if M.solve(log = verbose) <= 1 + tol:, where tol = 0 if solver=='PPL' else 1e-6. I don't like this solution, but I don't know what else we can do.

I don't have access to a 32-bit machine and so cannot test.

jdemeyer commented 7 years ago
comment:4

You could also forbid using a non-exact solver for this problem.

dcoudert commented 7 years ago
comment:5

Sure, we can force PPL, but it is way slower (can sometimes be faster on small graphs).

sage: G = graphs.Grid2dGraph(6,6)
sage: %time G.fractional_chromatic_index(solver='GLPK')
CPU times: user 43.4 ms, sys: 4.9 ms, total: 48.3 ms
Wall time: 52.1 ms
4.0
sage: %time G.fractional_chromatic_index(solver='PPL')
CPU times: user 1min 11s, sys: 256 ms, total: 1min 11s
Wall time: 1min 12s
4

I agree that using a tolerance gap is not a nice solution either.

dcoudert commented 6 years ago
comment:6

I don't see better solution than making PPL the default solver here.


New commits:

7485007trac #23798: set PPL has default solver
dcoudert commented 6 years ago

Commit: 7485007

dcoudert commented 6 years ago

Author: David Coudert

dcoudert commented 6 years ago

Branch: u/dcoudert/23798

jdemeyer commented 6 years ago
comment:7

"Be aware that this method may loop endlessly when using some non exact solvers on 32-bits". I doubt that this is problem specific to 32 bits. The wording seems to imply that it's safe to use non-exact solvers on 64-bit machines.

jdemeyer commented 6 years ago
comment:8

Also, this isn't quite correct:

Tickets :trac:`23658` and :trac:`23798` are fixed::

followed by a test with GLPK.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 7485007 to 910fb83

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

910fb83trac #23798: reviewers comments
dcoudert commented 6 years ago
comment:10

Is this more appropriate ?

jdemeyer commented 6 years ago
comment:11

Well, it depends. Do you consider the code here to be a fix or a workaround? I am asking because you need to decide what to do with

sage: g.fractional_chromatic_index(solver='GLPK') # known bug (#23798)

You cannot say that this ticket is a known bug while at the same time fixing this ticket.

dcoudert commented 6 years ago
comment:13

The problem is not fixed. That's why I changed the text to Issue reported in :trac:`23658` and :trac:`23798` with non exact solvers::. What else can I write to be more correct/specific?

jdemeyer commented 6 years ago

Changed commit from 910fb83 to none

jdemeyer commented 6 years ago

Description changed:

--- 
+++ 
@@ -6,3 +6,5 @@
             3.0

added in src/sage/graphs/graph.py by #23658 fails with GLPK-4.63 on 32-bit. + +As a workaround, we use PPL by default in #24099.

jdemeyer commented 6 years ago

Changed branch from u/dcoudert/23798 to none

jdemeyer commented 6 years ago

Changed author from David Coudert to none

jdemeyer commented 6 years ago
comment:15

Replying to @dcoudert:

The problem is not fixed.

Then I'm moving your branch to a new ticket: #24099.

dcoudert commented 6 years ago
comment:16

OK, thanks.

dcoudert commented 4 years ago
comment:17

Since #24824, we use GLPK 4.65. Does anyone with access to a 32-bit machine still see the bug ?

mkoeppe commented 3 years ago
comment:20

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

DaveWitteMorris commented 3 years ago
comment:21

Replying to @dcoudert:

Since #24824, we use GLPK 4.65. Does anyone with access to a 32-bit machine still see the bug ?

I still see the bug (on a 32-bit debian virtual machine). The default solver seems instantaneous, but I let solver='GLPK' run for about 15 minutes and did not get an answer.

dcoudert commented 3 years ago
comment:22

This is unfortunate.

The only solutions I see are:

mkoeppe commented 3 years ago
comment:23

Replying to @dcoudert:

I suspect that we need to change if M.solve(log = verbose) <= 1: to if M.solve(log = verbose) <= 1 + tol:, where tol = 0 if solver=='PPL' else 1e-6. I don't like this solution, but I don't know what else we can do.

Using a tolerance is exactly the right solution. The test for exact <= 1 and == 1 is meaningless with a numerical LP solver. LP solvers use perturbations systematically. It is not a bug if the result is not an exact integer.

mkoeppe commented 3 years ago
comment:24

See also my explanations in #30635 comment:20 and following.

dimpase commented 3 years ago
comment:25

there are two LPs involved, one of them for a maximum weight matching, something that can be instead done by a combinatorial algorithm, see e.g. Blossom V in http://pub.ist.ac.at/~vnk/software.html

dimpase commented 3 years ago
comment:26

If I force PPL on the inner (matching) LP:

--- a/src/sage/graphs/graph_coloring.pyx
+++ b/src/sage/graphs/graph_coloring.pyx
@@ -825,7 +825,7 @@ def fractional_chromatic_index(G, solver="PPL", verbose_constraints=False, verbo
     frozen_edges = [frozenset(e) for e in G.edges(labels=False, sort=False)]

     # Initialize LP for maximum weight matching
-    M = MixedIntegerLinearProgram(solver=solver, constraint_generation=True)
+    M = MixedIntegerLinearProgram(solver="PPL", constraint_generation=True)

     # One variable per edge
     b = M.new_variable(binary=True, nonnegative=True)

then on a 32-bit system it's all fine (GLPK from the system, unpatched, so these extra messages)

sage: G=graphs.PetersenGraph()
sage: G.fractional_chromatic_index(solver="GLPK")
Long-step dual simplex will be used
Long-step dual simplex will be used
Long-step dual simplex will be used
Long-step dual simplex will be used
Long-step dual simplex will be used
Long-step dual simplex will be used
3.0
dcoudert commented 3 years ago

Commit: ebcde7c

dcoudert commented 3 years ago

Author: David Coudert

dcoudert commented 3 years ago

Branch: public/graphs/23798_fractional_chromatic_index

dcoudert commented 3 years ago
comment:27

Following above discussion, I added a tolerance gap for numerical LP solvers.

Note that we can use the networkx implementation of the blossom algorithm via the matching method, but it does not solve the issue. Actually, it's slower and worse for the rounding as I observe the issue on a 64 bits machine...


New commits:

ebcde7ctrac #23798: add tolerance gap for numerical LP solvers
dimpase commented 3 years ago
comment:28

I don’t like this approach. Without explicit guarantees that these tolerances are correct, it is replacing correct algorithms with heuristics.

mkoeppe commented 3 years ago
comment:29
         matching = [fe for fe in frozen_edges if M.get_values(b[fe]) == 1]

This line also needs changing because the test "== 1" is not robust.

dimpase commented 3 years ago
comment:30

I don’t see how one can make the oracle (the inner LP) inexact, without potentially returning a very wrong answer.

The oracle checks that there is no maximum weight matching of weight >1. Say, we let it error by epsilon, i.e we terminate with oracle returning 1+epsilon. Potentially, there could be K maximum matchings with this weight, if they are disjoint this means that the final error is K times epsilon, oops…

dimpase commented 3 years ago

Reviewer: Dima Pasechnik

dcoudert commented 3 years ago
comment:32

I don't like this solution either but I don't know what to do when a solver returns 0.99999... instead of 1 although we have set the variable type to binary. The solvers are aware of the type of the variable and so should return a value with the correct type and not a double. The solution might be in the backends.

dimpase commented 3 years ago
comment:33

Replying to @dcoudert:

I don't like this solution either but I don't know what to do when a solver returns 0.99999... instead of 1 although we have set the variable type to binary. The solvers are aware of the type of the variable and so should return a value with the correct type and not a double. The solution might be in the backends.

No, my point is that without a special analysis it's not possible to argue that solving the oracle problem (with non-integer objective function) inexactly provides a correct result, even if you "correctly" round 0.9999... to 1. It's because a small oracle error may get amplified a lot in the main LP. Welcome to floating point hell :-)|

dimpase commented 3 years ago
comment:34

Replying to @dcoudert:

Following above discussion, I added a tolerance gap for numerical LP solvers.

Note that we can use the networkx implementation of the blossom algorithm via the matching method, but it does not solve the issue. Actually, it's slower and worse for the rounding as I observe the issue on a 64 bits machine...

The oracle implementation here is naive, and bound to get very slow; it's integer LP without Edmonds' constraints, instead of a "normal" LP over the matching polytope with Edmonds' constraints (aka blossom inequalities). So this would need yet another oracle (as there are too exponentially many inequalities there), but well, it's polynomial time then. The generated constraints can stay, so this should be fast.


New commits:

ebcde7ctrac #23798: add tolerance gap for numerical LP solvers
mkoeppe commented 3 years ago
comment:35

I took a quick look at the function now. I would suggest the following changes:

  1. Before adding a new constraint to the master problem, verify that matching is indeed a matching. In this way, the master problem will always be a correct relaxation, even if an inexact oracle is used.

  2. When the numerical solver that is used for solving the separation problem does not find a matching of value greater than 1 + epsilon, you can switch to PPL - then, with a bit of luck, it can prove the bound <= 1.

  3. It will make sense to have separate parameters for the solver used for the master problem and the one(s) used for the separation problem.

dimpase commented 3 years ago
comment:36

Actually, it seems that even with PPL, the code is just wrong, as PPL does not do MILP, it only does LP, right?

mkoeppe commented 3 years ago
comment:37

The PPL does have a (very limited) MIP solver.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1926be5trac #23798: merged with 9.5.beta5
43e8873trac #23798: ideas from comment 35
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from ebcde7c to 43e8873

dcoudert commented 2 years ago
comment:40

I tried the ideas from #comment:35. I have let some code for debugging as the code may loop forever when using GLPK for both master and separation problems. The patchbot will complain...

We should search for another method not relying on LP solvers, if any...