sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Weird timeouts in doctesting generic_graph with 4.5.2.alpha0 on some systems #9584

Closed e14f4152-4982-4ace-8c95-73a0599b109b closed 14 years ago

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago

Reported by John Palmieri and Leif Leonhardy on sage-release:

On 07/23/2010 12:18 AM, leif wrote:
> John H Palmieri wrote:
>> On iras (ia64-Linux-suse), after continuing the build, one failure:
>>
>> sage -t -long "devel/sage/sage/graphs/genus.pyx"
>> **********************************************************************
>> File "/home/palmieri/iras/sage-4.5.2.alpha0/devel/sage/sage/graphs/
>> genus.pyx", line 129:
>>     sage: get_memory_usage(t)
>> Expected:
>>     0.0
>> Got:
>>     -0.28125
>> **********************************************************************
> 
> So whenever you run out of memory on that machine, start a few instances
> of that program... :D :D :D
> 
>> Several machines (cleo, iras) get timeouts on generic_graph.py.
> 
> (Doesn't terminate within an hour here on 32-bit Ubuntu, Pentium 4...)

These may be unrelated to each other, however.


The doctest error in sage/graphs/generic_graph.py on 32-bit systems is now #9594.

CC: @jhpalmieri @nexttime @nathanncohen

Component: doctest coverage

Keywords: generic_graph, generic graph, time-out, time out

Author: Leif Leonhardy

Reviewer: John Palmieri

Merged: sage-4.5.2.alpha1

Issue created by migration from https://trac.sagemath.org/ticket/9584

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 14 years ago
comment:49

Nathann, do you open a new ticket?

Well, I've never been able to produce this error on my computer actually ... :-/

This being said, if Sage is exporting invalid linear programs, that's a very bad news, as it does it using GLPK. So if the problem stored in the file is invalid, then the problem solved by GLPK should be too, as they behave the same way : to solve it, Sage first feeds it to GLPK using method A, then calls "solve". To export it, Sage first feeds it to GLPK using method A, then calls the export method from GLPK.

So if the files are invalid, that's a very bad news indeed !!

Nathann

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 14 years ago
comment:50

Replying to @nathanncohen:

[...] if Sage is exporting invalid linear programs, that's a very bad news, as it does it using GLPK. So if the problem stored in the file is invalid, then the problem solved by GLPK should be too, as they behave the same way [...]

Well, Sage does not run glpsol on a generated file, but uses the library interface, so there's a slight difference, and: names are only set when calling write_mps() or write_lp() - which goes wrong somehow in at least two cases. (I'm calling both p.write_mps() and p.write_lp() right before p.solve() in GenericGraph.edge_disjoint_spanning_trees(), and afterwards run glpsol (--freemps|--lp) on those files.)

$ local/bin/glpsol --freemps ../trac_9584-edge_disjoint_spanning_trees-5.mps
GLPSOL: GLPK LP/MIP Solver, v4.44
Parameter(s) specified in the command line:
 --freemps ../trac_9584-edge_disjoint_spanning_trees-5.mps
Reading problem data from `../trac_9584-edge_disjoint_spanning_trees-5.mps'...
../trac_9584-edge_disjoint_spanning_trees-5.mps:8: warning: missing model name in field 3
Objective: R0000000
../trac_9584-edge_disjoint_spanning_trees-5.mps:1735: duplicate coefficient in row `R0000001'
MPS file processing error
$ local/bin/glpsol --lp ../trac_9584-edge_disjoint_spanning_trees-5.lp
GLPSOL: GLPK LP/MIP Solver, v4.44
Parameter(s) specified in the command line:
 --lp ../trac_9584-edge_disjoint_spanning_trees-5.lp
Reading problem data from `../trac_9584-edge_disjoint_spanning_trees-5.lp'...
../trac_9584-edge_disjoint_spanning_trees-5.lp:7: multiple use of variable `V0(None)((0,_1))' not allowed
CPLEX LP file processing error

So if the files are invalid, that's a very bad news indeed !!

Indeed, though the GLPK library works on th 64-bit system. On the 32-bit system where the doctest did not terminate, it's actually the solver that runs "forever" (trying to find a solution) - I do not yet know why... :(

jhpalmieri commented 14 years ago
comment:51

If I run the doctest on iras with "sage -t -verbose" and with the argument "log=3", I get this:


Trying:
    arborescences = g.edge_disjoint_spanning_trees(k, log=Integer(3))###line 3429:_sage_    >>> arborescences = g.edge_disjoint_spanning_trees(k, log=3)
Expecting nothing
GLPK Integer Optimizer, v4.44
1719 rows, 3650 columns, 10040 non-zeros
1370 integer variables, all of which are binary
Preprocessing...
1670 rows, 3605 columns, 9790 non-zeros
1325 integer variables, all of which are binary
Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part = 1670
Solving LP relaxation...
GLPK Simplex Optimizer, v4.44
1670 rows, 3605 columns, 9790 non-zeros
      0: obj =   0.000000000e+00  infeas =  2.610e+02 (0)
    500: obj =   0.000000000e+00  infeas =  2.610e+02 (0)
   1000: obj =   0.000000000e+00  infeas =  2.330e+02 (0)
   1500: obj =   0.000000000e+00  infeas =  1.172e+00 (0)
*  1589: obj =   0.000000000e+00  infeas =  0.000e+00 (0)
OPTIMAL SOLUTION FOUND
Integer optimization begins...
+  1589: mip =     not found yet <=              +inf        (1; 0)
+  3341: mip =     not found yet <=   0.000000000e+00        (12; 0)
+  4583: mip =     not found yet <=   0.000000000e+00        (25; 0)
+  5915: mip =     not found yet <=   0.000000000e+00        (39; 0)
+  7295: mip =     not found yet <=   0.000000000e+00        (53; 0)
+  9064: mip =     not found yet <=   0.000000000e+00        (68; 0)
+ 10470: mip =     not found yet <=   0.000000000e+00        (86; 0)
+ 12453: mip =     not found yet <=   0.000000000e+00        (104; 0)
+ 15172: mip =     not found yet <=   0.000000000e+00        (122; 0)
+ 19359: mip =     not found yet <=   0.000000000e+00        (143; 1)
+ 25674: mip =     not found yet <=   0.000000000e+00        (147; 19)
+ 30995: mip =     not found yet <=   0.000000000e+00        (156; 40)
Time used: 60.0 secs.  Memory used: 4.5 Mb.
+ 36038: mip =     not found yet <=   0.000000000e+00        (178; 73)
+ 41514: mip =     not found yet <=   0.000000000e+00        (188; 126)
+ 45778: mip =     not found yet <=   0.000000000e+00        (213; 179)
+ 51616: mip =     not found yet <=   0.000000000e+00        (232; 244)
+ 56429: mip =     not found yet <=   0.000000000e+00        (215; 395)
+ 61588: mip =     not found yet <=   0.000000000e+00        (220; 483)
+ 66918: mip =     not found yet <=   0.000000000e+00        (219; 610)
+ 71927: mip =     not found yet <=   0.000000000e+00        (225; 691)
+ 77942: mip =     not found yet <=   0.000000000e+00        (218; 802)
+ 82716: mip =     not found yet <=   0.000000000e+00        (219; 910)
+ 87340: mip =     not found yet <=   0.000000000e+00        (216; 1034)
+ 92302: mip =     not found yet <=   0.000000000e+00        (215; 1149)
Time used: 120.0 secs.  Memory used: 5.1 Mb.
+ 98199: mip =     not found yet <=   0.000000000e+00        (216; 1267)
+103543: mip =     not found yet <=   0.000000000e+00        (209; 1380)
+109695: mip =     not found yet <=   0.000000000e+00        (205; 1475)
+115375: mip =     not found yet <=   0.000000000e+00        (194; 1626)
+120734: mip =     not found yet <=   0.000000000e+00        (191; 1775)
+126850: mip =     not found yet <=   0.000000000e+00        (192; 1890)
+132308: mip =     not found yet <=   0.000000000e+00        (195; 1994)
+137437: mip =     not found yet <=   0.000000000e+00        (200; 2119)
+142222: mip =     not found yet <=   0.000000000e+00        (185; 2285)
+147918: mip =     not found yet <=   0.000000000e+00        (194; 2358)
+152167: mip =     not found yet <=   0.000000000e+00        (211; 2444)
+157139: mip =     not found yet <=   0.000000000e+00        (188; 2631)
Time used: 180.0 secs.  Memory used: 5.2 Mb.
+162974: mip =     not found yet <=   0.000000000e+00        (194; 2717)
+167746: mip =     not found yet <=   0.000000000e+00        (185; 2875)
+173860: mip =     not found yet <=   0.000000000e+00        (189; 2945)
+179993: mip =     not found yet <=   0.000000000e+00        (185; 3036)
+184852: mip =     not found yet <=   0.000000000e+00        (186; 3167)
+190815: mip =     not found yet <=   0.000000000e+00        (184; 3260)
+195443: mip =     not found yet <=   0.000000000e+00        (191; 3340)
+200019: mip =     not found yet <=   0.000000000e+00        (199; 3441)
+205299: mip =     not found yet <=   0.000000000e+00        (193; 3550)
+211558: mip =     not found yet <=   0.000000000e+00        (190; 3643)
+217128: mip =     not found yet <=   0.000000000e+00        (205; 3721)
+222749: mip =     not found yet <=   0.000000000e+00        (193; 3851)
Time used: 240.1 secs.  Memory used: 5.2 Mb.
+228284: mip =     not found yet <=   0.000000000e+00        (192; 3966)
+234369: mip =     not found yet <=   0.000000000e+00        (187; 4068)
+239753: mip =     not found yet <=   0.000000000e+00        (186; 4177)
+245044: mip =     not found yet <=   0.000000000e+00        (203; 4260)
+251418: mip =     not found yet <=   0.000000000e+00        (191; 4393)
+257305: mip =     not found yet <=   0.000000000e+00        (184; 4511)
+262969: mip =     not found yet <=   0.000000000e+00        (192; 4590)
+268135: mip =     not found yet <=   0.000000000e+00        (187; 4714)
+274100: mip =     not found yet <=   0.000000000e+00        (192; 4792)
+279175: mip =     not found yet <=   0.000000000e+00        (185; 4929)
+285635: mip =     not found yet <=   0.000000000e+00        (177; 5033)
+291442: mip =     not found yet <=   0.000000000e+00        (189; 5090)
Time used: 300.1 secs.  Memory used: 5.2 Mb.
*** *** Error: TIMED OUT! PROCESS KILLED! *** ***
*** *** Error: TIMED OUT! *** ***

Increasing the timeout threshold just lets this continue longer.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 14 years ago
comment:52

Replying to @jhpalmieri:

Increasing the timeout threshold just lets this continue longer.

I'm at 9500s+ and it keeps writing to the screen, though sage -t ... has already terminated... :D

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 14 years ago
comment:53

Well, it is not a surprise that such LP can run for a long time... Right now, a LP has been running on one of our lab's machine (Something like 8 CPU, some fearful amount of RAM) for more that 2 days straight. And there is some hope it will finish ! That's one of the things I accept coming from LP. Sometimes, they are too slow ;-)

Nathann

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 14 years ago
comment:54

Replying to @nathanncohen:

Well, it is not a surprise that such LP can run for a long time...

As mentioned above, the whole test (-long) of generic_graph.py - with in theory the same LP program (30 vertices) - took 281 seconds on a much slower machine.

I still don't know if the GLPK library or the Sage interface/code that builds the LP program structure is broken on the other (very similar, btw) machine.

(I've killed the Python process running the solver after 42138+ seconds, which was still searching for a solution.)

Nathann, could you take a look at the variable name generation which causes the MPS and LP file errors? (These should be reproducible on all systems I think, at least I've met them on a 64-bit system, running Ubuntu 9.04.)

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 14 years ago
comment:55

The corresponding ticket is #9617 . Thank you for finding this one ! :-)

Nathann

jdemeyer commented 13 years ago
comment:56

The error in the ticket (negative get_memory_usage()) still occurs today on the Skynet machine iras with sage-4.6.1.alpha3.

e14f4152-4982-4ace-8c95-73a0599b109b commented 13 years ago
comment:57

Replying to @jdemeyer:

The error in the ticket (negative get_memory_usage()) still occurs today on the Skynet machine iras with sage-4.6.1.alpha3.

Please see #9863 for this problem.