sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.3k stars 447 forks source link

rubiks fails doctest with gcc 4.6.0 and -O2 optimisation. #11168

Closed bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 closed 13 years ago

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

On a Sun Ultra 27 running OpenSolaris 06/2009, the rubiks-20070912.p12 package builds, but fails a doctest

drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
**********************************************************************
File "/export/home/drkirkby/sage-4.7.alpha3/devel/sage/sage/interfaces/rubik.py", line 132:
    sage: solver = OptimalSolver() # long time
Exception raised:
    Traceback (most recent call last):
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        solver = OptimalSolver() # long time###line 132:
    sage: solver = OptimalSolver() # long time
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/sage/interfaces/rubik.py", line 98, in __init__
        self.ready()
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/sage/interfaces/rubik.py", line 117, in ready
        self.child.expect('enter cube')
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/pexpect.py", line 912, in expect
        return self.expect_list(compiled_pattern_list, timeout, searchwindowsize)
      File "/export/home/drkirkby/sage-4.7.alpha3/local/lib/python/site-packages/pexpect.py", line 978, in expect_list
        raise EOF (str(e) + '\n' + str(self))
    EOF: End Of File (EOF) in read_nonblocking(). Empty string style platform.
    <pexpect.spawn instance at 0xc5367ac>
    version: 2.0 ($Revision: 1.151 $)
    command: /export/home/drkirkby/sage-4.7.alpha3/local/bin/optimal
    args: ['/export/home/drkirkby/sage-4.7.alpha3/local/bin/optimal']
    patterns:
        enter cube
    buffer (last 100 chars): 
    before (last 100 chars): olutions

    initializing transformation tables

    init_fulledge_to_edgequot : too many  edgequot's

    after: <class 'pexpect.EOF'>
    match: None
    match_index: None
    exitstatus: None
    flag_eof: 1
    pid: 10994
    child_fd: 3
    timeout: None
    delimiter: <class 'pexpect.EOF'>
    logfile: None
    maxread: 2000
    searchwindowsize: None
    delaybeforesend: 0.1
**********************************************************************
1 items had failures:
   1 of  12 in __main__.example_3
***Test Failed*** 1 failures.
For whitespace errors, see the file /export/home/drkirkby/.sage//tmp/.doctest_rubik.py
         [9.4 s]

----------------------------------------------------------------------
The following tests failed:

        sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
Total time for all tests: 9.4 seconds

However, if the optimisation level is changed from -O2 to -O1, the program behaves properly.

drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
         [26.7 s]

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 26.7 seconds

Similar problems have been observed on other platforms too. A revised package, which works around the gcc 4.6.0 bug by adding the flag -fno-ivopts:

http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg

Reported upstream to gcc: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

For other gcc 4.6.0 specific problems see #11216

Upstream: Fixed upstream, but not in a stable release.

CC: @jhpalmieri

Component: porting: Solaris

Author: David Kirkby, Jeroen Demeyer

Reviewer: John Palmieri, David Kirkby

Merged: sage-4.7.rc2

Issue created by migration from https://trac.sagemath.org/ticket/11168

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -69,4 +69,18 @@

 However, if the optimisation level is changed from -O2 to -O1, the program behaves properly. 

+```
+drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
+sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
+         [26.7 s]
+ 
+----------------------------------------------------------------------
+All tests passed!
+Total time for all tests: 26.7 seconds
+```

+A revised package, which reduces the optimisation level on Solaris to -O1 can be found at
+
+http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p13.spkg
+
+
bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Author: David Kirkby

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-On a Sun Ultra 27 running OpenSolaris06/2009, the rubiks-20070912.p12 package builds 
+On a Sun Ultra 27 running OpenSolaris 06/2009, the rubiks-20070912.p12 package builds, but fails a doctest

drkirkby@hawk:~/sage-4.7.alpha3$ ./sage -t -long -force_lib devel/sage/sage/interfaces/rubik.py

jhpalmieri commented 13 years ago
comment:3

The patch is very simple. The new spkg builds on both hawk (OpenSolaris on x86, with gcc 4.5.0 and gcc 4.6.0) and t2 (Solaris on sparc, with gcc 4.5.1 and 4.6.0), and tests pass in all cases. With the old version, tests failed on both machines using gcc 4.6.0 (but everything worked fine with the earlier version of gcc).

Note: to build with gcc 4.6.0 on t2, I had to set LD_LIBRARY_PATH and unset LD_OPTIONS; I couldn't get LD_OPTIONS to work. But I don't really know how to properly use these variables anyway...

jhpalmieri commented 13 years ago

Reviewer: John Palmieri

jdemeyer commented 13 years ago

Merged: sage-4.7.alpha5

jdemeyer commented 13 years ago

Changed merged from sage-4.7.alpha5 to none

jdemeyer commented 13 years ago
comment:6

Marking this as needs_work because there are more systems with the same symptom. In particular, on the buildbot Fedora 14-32 (cicero).

jdemeyer commented 13 years ago
comment:7

Also, I really dislike simply lowering the optimization level. That is wiping the problem under the carpet, not solving it. Also should be reported upstream.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:8

Replying to @jdemeyer:

Also, I really dislike simply lowering the optimization level. That is wiping the problem under the carpet, not solving it. Also should be reported upstream.

In the short term, I don't see we can do anything other than just lower the optimisation.

It should be noted that there are quite a few Sage patches applied - some directly to the source code, which should not have been done, but has been done. This package has been patched 13 times already!

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -79,7 +79,7 @@
 Total time for all tests: 26.7 seconds

-A revised package, which reduces the optimisation level on Solaris to -O1 can be found at +Similar problems have been observed on other platforms too. A revised package, which reduces the optimisation level to -O1 can be found at

http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p13.spkg

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:9

Here's a revised package, which reduces the optimisation level to -O1 (as my previous patch), but this time only on all platforms, and not just Solaris.

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:10

Replying to @sagetrac-drkirkby:

Here's a revised package, which reduces the optimisation level to -O1 (as my previous patch), but this time only on all platforms, and not just Solaris.

Dave

The grammar was not very good there! The point being, this patch changes the optimisation to -O1 on all platforms.

Dave

a9bc4e33-7b98-4180-affc-8dfcef89e22b commented 13 years ago
comment:11

I believe only the file src/reid/optimal.c needs to be reduced to -O1 due to an optimization regression in gcc-4.6.0. See gcc bugzilla # 48702 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

a9bc4e33-7b98-4180-affc-8dfcef89e22b commented 13 years ago

Changed upstream from Not yet reported upstream; Will do shortly. to Reported upstream. Little or no feedback.

a9bc4e33-7b98-4180-affc-8dfcef89e22b commented 13 years ago

Changed upstream from Reported upstream. Little or no feedback. to Reported upstream. Developers acknowledge bug.

jdemeyer commented 13 years ago
comment:13

Replying to @sagetrac-mariah:

I believe only the file src/reid/optimal.c needs to be reduced to -O1 due to an optimization regression in gcc-4.6.0. See gcc bugzilla # 48702 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702

Many thanks for sorting this out!!!

I'm not so sure whether it makes sense to change the optimization level for just 1 file. Maybe the whole directory src/reid yes, but one file?

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:14

I don't know for sure, but I don't believe any of this is going to be too time-critical. I can't somehow seeing one wanting to solve lots of rubiks cubes and given solving them takes very little time, I don't see this as very important if we lose a few clock cycles here or there.

dave

jdemeyer commented 13 years ago
comment:15

Since this is clearly a gcc 4.6.0 bug, I suggest to check in spkg-install whether we are using gcc 4.6.0 and only reduce the optimization level to -O1 if so.

Another annoying thing about this package is that some of the patches in patches/ are the wrong way around (they are patches from the patched version to the original version).

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:16

I've now changed this so instead of reducing the optimisation on Solaris, it does it on all platforms with gcc 4.6.0. First it tests for gcc using $SAGE_LOCAL/bin/testcc.sh, then if we are using gcc, a second test is performed for gcc 4.6.0 by adding the 'dumpversion' option to gcc.

The following is the critical bit of new code.

OPTIMIZATION_FLAGS="-O2"
if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then 
   echo "Warning: Reducing the optimisation level to -O1 due to a bug in gcc 4.6.0"
   OPTIMIZATION_FLAGS="-O1"
fi

There are some typos corrected too - see the attached patch.

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -79,8 +79,8 @@
 Total time for all tests: 26.7 seconds

-Similar problems have been observed on other platforms too. A revised package, which reduces the optimisation level to -O1 can be found at +Similar problems have been observed on other platforms too. A revised package, which reduces the optimisation level to -O1 if using gcc 4.6.0 can be found at

-http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p13.spkg +http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p14.spkg

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Attachment: rubiks-optimisation-problem.patch.gz

Patch which changes optimisation to -O1 if using gcc 4.6.0 (for review purposes only - changes are in the .spkg)

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -83,4 +83,4 @@

 http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p14.spkg

-
+For other gcc 4.6.0 specific problems see #11216
jdemeyer commented 13 years ago

Changed author from David Kirkby to David Kirkby, Jeroen Demeyer

jdemeyer commented 13 years ago
comment:18

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -79,8 +79,8 @@
 Total time for all tests: 26.7 seconds

-Similar problems have been observed on other platforms too. A revised package, which reduces the optimisation level to -O1 if using gcc 4.6.0 can be found at +Similar problems have been observed on other platforms too. A revised package, which works around the gcc 4.6.0 bug by adding the flag -fno-ivopts:

-http://boxen.math.washington.edu/home/kirkby/patches/rubiks-20070912.p14.spkg +http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

For other gcc 4.6.0 specific problems see #11216

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:19

Replying to @jdemeyer:

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

I think my test

if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then  

for the version is preferable. I can't see the point in calling a C compiler with --version then greping the output, where there is a documented option (-dumpversion) to show just the version,

Also, this could well fail for another compiler with a version of 4.6.0 - I think its safer to test for gcc first.

Dave

jdemeyer commented 13 years ago
comment:20

Replying to @sagetrac-drkirkby:

Replying to @jdemeyer:

David, in principle I agree with your changes. However, the gcc report indicates a problem with gcc/tree-ssa-loop-ivopts.c and because of this it suffices to disable -fivopts. New spkg which does this (and also simplifies the gcc version check): http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg

I think my test

if [ "x`$SAGE_LOCAL/bin/testcc.sh $CC`" = xGCC ] && [ "x`$CC -dumpversion`" = x4.6.0 ] ; then  

for the version is preferable. I can't see the point in calling a C compiler with --version then greping the output, where there is a documented option (-dumpversion) to show just the version,

Also, this could well fail for another compiler with a version of 4.6.0 - I think its safer to test for gcc first.

My test does test for gcc in the --version output.

If I change the gcc test to use your original test code, would the rest of my patch get positive_review?

jdemeyer commented 13 years ago
comment:21

David, another minor issue with your test: I would change [ "x`$CC -dumpversion`" = x4.6.0 ] to [ "x`$CC -dumpversion 2>/dev/null`" = x4.6.0 ] in case that $CC does not support -dumpversion.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:22

Replying to @jdemeyer:

David, another minor issue with your test: I would change [ "x`$CC -dumpversion`" = x4.6.0 ] to [ "x`$CC -dumpversion 2>/dev/null`" = x4.6.0 ] in case that $CC does not support -dumpversion.

I need to be doing something else now (gardening), so don't have chance to do much with this, but a few quick notes.

Assuming you can confirm the new compiler option given does not cause a dramatic slowdown, and change the test, then feel free to change the ticket to positive review - don't wait for me. Issues about SPKG.txt and /dev/null are not too critical - make what changes you feel are appropriate.

Dave

jdemeyer commented 13 years ago
comment:23

Replying to @sagetrac-drkirkby:

can you check this option has not slowed things significantly. If it slower than -O1, then I think it would be better to revert to -O1.

On my machine (Linux arcanis 2.6.34-gentoo-r12 #2 SMP Mon Dec 6 17:16:04 CET 2010 x86_64 Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz GenuineIntel GNU/Linux) with gcc 4.6.0, the long test sage/interfaces/rubik.py takes:

  1. p14 (using -O1): 39.9 s
  2. p15 (using -O2 -fno-ivopts): 37.5 s
jdemeyer commented 13 years ago
comment:24

Replying to @sagetrac-drkirkby:

  • My test does not use the -dumpversion option unless the compiler is first confirmed to be gcc, since the second part of the test (the version) will not be executed unless the first part (the type of compiler), is confirmed to be GCC. So I don't really see the need to send stderr to /dev/null, but I'm not bothered about that fact.

Can you guarantee that every compiler which defines __GNUC__ understands the option -dumpversion?

jdemeyer commented 13 years ago

Attachment: rubiks-20070912.p14-p15.diff.gz

diff p14 -> p15 (for reviewing only)

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:25

Replying to @jdemeyer:

Replying to @sagetrac-drkirkby:

  • My test does not use the -dumpversion option unless the compiler is first confirmed to be gcc, since the second part of the test (the version) will not be executed unless the first part (the type of compiler), is confirmed to be GCC. So I don't really see the need to send stderr to /dev/null, but I'm not bothered about that fact.

Can you guarantee that every compiler which defines __GNUC__ understands the option -dumpversion?

I think if any compiler is going to aim to be GCC compatible then it would need to support the '-dumpversion' option. If it does not, then I would consider that a compiler bug. The Intel compiler supports '-dumpversion' - see

http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/main_cls_lin.pdf

Clang does too:

https://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Options.td?sortby=date&r1=106956&r2=106955&pathrev=106956&diff_format=s

GCC For Sun Systems uses the gcc front end and takes the GNU options. It actually uses the GCC source code, so I assume that would take the '-dumpversion' option too.

I'm not aware of any other GCC compatible compiler. But if they really are GCC compatible, they should take the '-dumpversion' option.

I've just checked gcc 3.4.3, and that takes the '-dumpversion' option.

drkirkby@hawk:~$ /usr/bin/gcc -dumpversion
3.4.3
drkirkby@hawk:~$ /usr/bin/g++ -dumpversion
3.4.3
drkirkby@hawk:~$ 

Just out of interest, I checked on t2.math (a SPARC system), and that supports -dumpversion' too.

kirkby@t2:32 ~$ /usr/sfw/bin/gcc -dumpversion
3.4.3

I would use more caution if it was the Fortran compiler we were testing, as the option gave a different output (same as --version) in older versions of gfortran. But that bug was fixed a year or two ago, so that now 'gfortran' gives just the version number if given the option '-dumpversion'. But that test in the rubiks package only tests the C compiler, so it's immaterial what gfortran does.

I would also note we are using the -dumpversion' compiler option already. The 'prereq' script uses two autoconf macros (ax_gcc_version.m4 & ax_gxx_version.m4) which use the option. That is run to check the versions of gcc and g++ are the same. So I think if there was a problem using the -dumpversion' compiler option, we would have known about it before.

Dave

jdemeyer commented 13 years ago
comment:26

David, I agree with everything you say, but I hope you can agree that the 2>/dev/null doesn't hurt anyway. Can it get positive_review now?

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:27

Replying to @jdemeyer:

David, I agree with everything you say, but I hope you can agree that the 2>/dev/null doesn't hurt anyway. Can it get positive_review now?

Sure.

jdemeyer commented 13 years ago

Merged: sage-4.7.rc1

jdemeyer commented 13 years ago

Changed merged from sage-4.7.rc1 to none

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -81,6 +81,6 @@

 Similar problems have been observed on other platforms too. A revised package, which works around the gcc 4.6.0 bug by adding the flag -fno-ivopts:

-[http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg](http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p15.spkg)
+[http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg](http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg)

 For other gcc 4.6.0 specific problems see #11216
jdemeyer commented 13 years ago

Changed reviewer from John Palmieri to John Palmieri, David Kirkby

jdemeyer commented 13 years ago

Attachment: rubiks-20070912.p15-p16.diff.gz

Diff for the rubiks spkg, for reviewing only

a9bc4e33-7b98-4180-affc-8dfcef89e22b commented 13 years ago
comment:32

The patch will be applied for later versions than just gcc-4.6.0. Recommend that the patch only be applied for gcc-4.6.0.

jdemeyer commented 13 years ago
comment:33

Replying to @sagetrac-mariah:

The patch will be applied for later versions than just gcc-4.6.0. Recommend that the patch only be applied for gcc-4.6.0.

Why?

There already exist pre-release versions of gcc 4.6.1 manifesting the same problem.

I think that having the workaround for all versions 4.6.x doesn't really hurt. It is certainly a safer alternative than applying the workaround only for gcc 4.6.0. If at some point, this bug is fixed in a later gcc 4.6.x, then we can still change the spkg accordingly.

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -83,4 +83,6 @@

 [http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg](http://boxen.math.washington.edu/home/jdemeyer/spkg/rubiks-20070912.p16.spkg)

+Reported upstream to gcc: [http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48702)
+
 For other gcc 4.6.0 specific problems see #11216
jdemeyer commented 13 years ago
comment:35

Can somebody please review? Thanks.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:36

Replying to @jdemeyer:

Can somebody please review? Thanks.

Sure:

drkirkby@hawk:~/sage-4.7.rc0$  ./sage -t  -long -force_lib devel/sage/sage/interfaces/rubik.py
sage -t -long -force_lib "devel/sage/sage/interfaces/rubik.py"
     [25.5 s]

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 25.6 seconds

so that option is working to resolve the bug on OpenSolaris too. This looks good to me.

Dave

jdemeyer commented 13 years ago

Merged: sage-4.7.rc2

jdemeyer commented 13 years ago

Changed upstream from Reported upstream. Developers acknowledge bug. to Fixed upstream, but not in a stable release.

jdemeyer commented 13 years ago
comment:39

See #11437 for a follow-up.