sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.45k stars 480 forks source link

Pynac module not initialized before being used. This causes a crash on 64-bit OpenSolaris. #11116

Closed bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 closed 4 years ago

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Sage builds fully 64-bit on Solaris 10 (SPARC).

On 64-bit OpenSolaris or Solaris 10, the stats package R fails to build. Since R is an external program, one can just touch SAGE_ROOT/spkg/installed/r-$versions and get an almost complete Sage.

However, this 64-bit Sage crashes at startup with OpenSolaris on x86, as discussed at:

http://groups.google.com/group/sage-devel/browse_thread/thread/efc864c79fed92df?hl=en

(one would expect similar on Solaris 10 x86 and probably SPARC too).

A backtrace with gdb on a Sun Ultra 27 running OpenSolaris 06/2009 shows:

drkirkby@hawk:~/64/sage-4.7.alpha3$ ./sage -gdb
Building Sage on Solaris in 64-bit mode
Creating SAGE_LOCAL/lib/sage-64.txt since it does not exist
Detected SAGE64 flag
Building Sage on Solaris in 64-bit mode
----------------------------------------------------------------------
| Sage Version 4.7.alpha3, Release Date: 2011-03-31                  |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
/export/home/drkirkby/64/sage-4.7.alpha3/local/bin/sage-ipython
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.11"...
warning: Lowest section in /lib/amd64/libdl.so.1 is .dynamic at 00000000000000b0
Python 2.6.4 (r264:75706, Apr  1 2011, 15:07:52)
[GCC 4.5.0] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
warning: Lowest section in /lib/amd64/libintl.so.1 is .dynamic at 00000000000000b0
warning: Lowest section in /lib/amd64/libpthread.so.1 is .dynamic at 00000000000000b0

Program received signal SIGSEGV, Segmentation fault.
0x00000000003eb0a5 in ?? ()
(gdb) bt
#0  0x00000000003eb0a5 in ?? ()
#1  0xfffffd7fff2ac5d1 in _Unwind_RaiseException_Body () from /lib/64/libc.so.1
#2  0xfffffd7fff2ac855 in _Unwind_RaiseException () from /lib/64/libc.so.1
#3  0xfffffd7ff91d6729 in __cxa_throw (obj=<value optimized out>, tinfo=<value optimized out>, dest=<value optimized out>)
    at ../../../../../gcc-4.5.0/libstdc++-v3/libsupc++/eh_throw.cc:78
#4  0xfffffd7fcec6d5ff in GiNaC::function::find_function (name=@0x4a359b0, nparams=2) at function.cpp:1446
#5  0xfffffd7fce9454ad in __pyx_f_4sage_8symbolic_8function_15BuiltinFunction__is_registered (__pyx_v_self=0x4a142f0) at sage/symbolic/function.cpp:7301
#6  0xfffffd7fce950755 in __pyx_pf_4sage_8symbolic_8function_8Function___init__ (__pyx_v_self=0x4a142f0, __pyx_args=<value optimized out>,
    __pyx_kwds=<value optimized out>) at sage/symbolic/function.cpp:2374
#7  0xfffffd7fffde7a70 in ?? ()
#8  0x00000016745f5f63 in ?? ()
#9  0x0000000004a0e5a8 in ?? ()
#10 0x0000000004a142f0 in ?? ()
#11 0x000000000000000b in ?? ()
#12 0x0000000004a0e5a8 in ?? ()
#13 0x0000000002c913e8 in ?? ()
#14 0xfffffd7fd76c2b30 in module_members () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0
#15 0x0000002752657572 in ?? ()
#16 0xfffffd7fd76d5c60 in ?? () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0
#17 0x2800000040520000 in ?? ()
#18 0xfffffd7fd76d5920 in ?? () from /export/home/drkirkby/64/sage-4.7.alpha3/local/lib//libpython2.6.so.1.0
#19 0x0000000000000005 in ?? ()
#20 0x00000000049e4db8 in ?? ()
---Type <return> to continue, or q <return> to quit---
#21 0x0000000000000000 in ?? ()

Burcin Erocal produced this Python call stack.

  File "/export/home/burcin/sage-4.7.alpha3/local/bin/sage-ipython", line 21, in <module>
    ipy_sage = IPython.Shell.start()
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/Shell.py", line 1233, in start
    return shell(user_ns = user_ns)
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/Shell.py", line 78, in __init__
    debug=debug,shell_class=shell_class)
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/ipmaker.py", line 644, in make_IPython
    force_import(profmodname)
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/IPython/ipmaker.py", line 66, in force_import
    __import__(modname)
  File "ipy_profile_sage.py", line 7, in <module>
    import sage.all_cmdline
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/all_cmdline.py", line 14, in <module>
    from sage.all import *
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/all.py", line 75, in <module>
    from sage.schemes.all    import *
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/all.py", line 25, in <module>
    from hyperelliptic_curves.all import *
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/hyperelliptic_curves/all.py", line 1, in <module>
    from constructor import HyperellipticCurve
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/hyperelliptic_curves/constructor.py", line 11, in <module>
    from sage.schemes.generic.all import ProjectiveSpace
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/all.py", line 4, in <module>
    from affine_space     import AffineSpace, is_AffineSpace
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/affine_space.py", line 24, in <module>
    import algebraic_scheme
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/algebraic_scheme.py", line 143, in <module>
    import toric_variety
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/schemes/generic/toric_variety.py", line 236, in <module>
    from sage.geometry.cone import Cone, is_Cone
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/geometry/cone.py", line 174, in <module>
    from sage.combinat.posets.posets import FinitePoset
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/combinat/posets/posets.py", line 24, in <module>
    from sage.graphs.all import DiGraph
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/graphs/all.py", line 16, in <module>
    from graph_editor import graph_editor
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/graphs/graph_editor.py", line 22, in <module>
    from sagenb.misc.support import EMBEDDED_MODE
  File "/export/home/burcin/sage-4.7.alpha3/devel/sagenb/sagenb/misc/support.py", line 563, in <module>
    from sage.symbolic.all import Expression, SR
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/all.py", line 9, in <module>
    from sage.symbolic.relation import solve, solve_mod, solve_ineq
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/relation.py", line 314, in <module>
    from sage.calculus.calculus import maxima
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/calculus/calculus.py", line 374, in <module>
    from sage.symbolic.integration.integral import indefinite_integral, \
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/integration/integral.py", line 129, in <module>
    indefinite_integral = IndefiniteIntegral()
  File "/export/home/burcin/sage-4.7.alpha3/local/lib/python2.6/site-packages/sage/symbolic/integration/integral.py", line 62, in __init__
    BuiltinFunction.__init__(self, "integrate", nargs=2)

Burcin writes on sage-devel

'' It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...''

''We need a better solution for making sure modules are initialized properly before anything is imported from them. I thought putting an init.py file in sage/symbolic/ with "import pynac" would solve the problem. However, it seems that python just ignores that file. ''

This is one of the very few issues preventing a complete 64-bit build on Solaris/OpenSolaris, so it would be nice to crack this one.

CC: @burcin @jhpalmieri @robertwb @gagern @vbraun @dimpase

Component: porting: Solaris

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/11116

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:1

7029 is semi-related to this.

kiwifb commented 13 years ago
comment:2

That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:3

Replying to @kiwifb:

That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40

Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:4

Replying to @sagetrac-drkirkby:

Replying to @kiwifb:

That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40

Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.

Dave

More knowledge than me I mean - not more knowledge than Robert.

Dave

kiwifb commented 13 years ago
comment:5

Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453 I don't think solaris uses glibc (but you can confirm that) so you getting it there is very interesting. There is a test script in the glibc bug report above to test for the problem - it would be interesting if you could run it. In sage-on-gentoo it started on amd64 and spread into x86 land later the exact mix triggering the problem is still unknown. But if Robert can find a pure python solution that would be a relief. We are now giving users instructions on patching their glibc which isn't nice.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago

Description changed:

--- 
+++ 
@@ -125,7 +125,7 @@
     BuiltinFunction.__init__(self, "integrate", nargs=2)

-Burchin writes on sage-devel +Burcin writes on sage-devel

'' It seems that cones.py looks for posets.py, which needs the graphs

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:7

Replying to @kiwifb:

Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453

I'm suspicious of the fact this may be a bug in glibc, as I'm 99% sure GCC will use the Sun C library and not the GNU one. I think Burcin's diagnosis of the problem is more likely to be correct. I posted a comment to that effect on the Gentoo site.

There's a test program on the Redhat glibc site, but I can't get that to run. Probably a bashism that needs a newer version of bash than I have.

Dave

kiwifb commented 13 years ago
comment:8

That's why the fact you get it is so interesting. Patching glibc solved everyone's problem in gentoo but it may be that this is just a work around. Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should.

Note it has been observed to happen after update that are seemingly unrelated to sage packages. Which suggests that there is indeed a complex mix leading to the failure. Fixing any one components of the mix is probably enough to prevent the problem altogether.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:9

Just to confirm, creating a 64-bit hello world program with gcc and using 'ldd' to find what libraries are linked.

drkirkby@hawk:~$ gcc -m64 test.c
drkirkby@hawk:~$ ldd a.out
    libc.so.1 =>     /lib/64/libc.so.1
    libm.so.2 =>     /lib/64/libm.so.2
drkirkby@hawk:~$ 

so only the Sun libraries are being used.

I'm hoping Robert or Bucin can come up with a reliable fix for this at the Python level. At least I know that will solve the problem on OpenSolaris.

As with many things, if the behavior is not defined, one can get mysterious and not necessarily reproducible bugs. It sounds like there are a couple of things which are not defined fully.

Dave

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:10

Replying to @kiwifb:

the exact mix triggering the problem is still unknown.

Ingredients to the Linux/libc issue as far as I know them:

  1. Dynloading of a library with dependencies, so that multiple so files are loaded in response to a single dlopen call. The gtk python module is a likely candidate here.
  2. One of the deps must use the initial-exec flavour of thread-local variables. The proprietary nvidia OpenGL drivers (libnvidia-tls.so) do this. Iirc a line mentioning "R_X86_64_TPOFF64" in the output from "objdump -R" is a good indication for this.
  3. Another of the deps (from the same dlopen call) must be the place where things will later go wrong. In our case that was the C++ library, libstdc++, shipped with gcc.
  4. The latter dep must make use of its thread local vars (of local-dynamic flavour). In our case that was the C++ exception handling mechanism. So if no exception gets thrown, we won't encounter the issue here.

Replying to @kiwifb:

Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should. Fixing any one components of the mix is probably enough to prevent the problem altogether.

I'd agree to both. So you can either fix the exception handling mechanism, or avoid the exception being thrown. Either one makes the problem vanish, although the other half of the problem might well resurface somewhere else later on.

Is there a chance of linking that OpenSolaris backtrace to actual code lines from the Sun C library, to see what's happening there?

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago

Replying to @sagetrac-drkirkby:

Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:12

Replying to @gagern:

Replying to @sagetrac-drkirkby:

Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?

If I understand correctly, his problem was that that he had compiled boost & his application with a different compiler to the Sun-compiled Python. But that is not the case here, as Python, boost and the rest of Sage are all built with the same compiler - Sage is not using Sun's Python.

Robert Bradshaw had some suggestions about how to solve this, but I don't know sufficient Python to implement them myself.

There are some other issues remaining when I comment out the code that's causing the crash, but I'm not sure if those are in any way related to the fact I've comment out a section of code.

Dave

kiwifb commented 13 years ago
comment:13

The issue from sage-on-gentoo seem to have disappeared on one of my machines. I am not completely sure if Gentoo included Martin's patch already or if pynac-0.2.3 shipped in sage-4.7.1_alpha4 is responsible. It is probably worth giving 4.7.1_alpha4 a dpin.

burcin commented 13 years ago
comment:14

The solution to this would be to import one of the objects mentioned in the chain I described lazily:

It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...

This can be done by

Note that quite a while ago, I wrote a script (sage-test-import in local/bin) to test for these problems. It imports each module in the Sage library individually, and checks if we got any errors. It would be a significant achievement if we can get a release to pass this test. This would go a long way towards making Sage more modular.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:15

I'll give the latest alpha a try if there's a chance the problem may have been fixed.

It would be really good to get this resolved, as basically I am having to give up an attempt at a 64-bit Solaris port due to this bug. I can't do anything until this is solved, and I don't have the knowledge to do it myself. Hence you may have noticed my absence on sage-devel. I really can't make any useful contribution to Sage until this issue is resolved.

I'll give the latest alpha a build 64-bit. If this can be resolved, then there's a good chance of completing a 64-bit Solaris port, but without it solved, the port is effectively stalled.

Dave

burcin commented 13 years ago
comment:16

Attachment: trac_11116-fix_imports.patch.gz

attachment: trac_11116-fix_imports.patch is a first attempt to clear up the circular dependencies. However, it still doesn't fix this problem.

Whatever I do, it seems that the initialization for libpynac.so is not run by the time modules in sage.functions are loaded. Is there a trick to make sure the library is initialized sooner?

I added Volker to the CC list, since he mentioned exactly this problem while working on pynac at SD31. :)

vbraun commented 13 years ago
comment:17

Having spent the whole day yesterday worrying about import ordering, I must say that we have way too many circular imports. This is also an issue because we currently call Cython with --disable-function-redefinition that changes the import ordering for cython files to an old and obsolete behavior. But Sage relies on it, otherwise many of its circular imports break.

It would be the wrong approach to require module X to load before module Y, this will just cause maintenance headaches down the road. Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so. I would deposit that

To my mind, the problem here in this ticket is that the sage.symbolic.integration.integral module instantiates its IndefiniteIntegral class,

indefinite_integral = IndefiniteIntegral()

which in turn calls into pynac to register itself. Really there is no reason for this to be immediate, and it opens a can of worms about initialization order.

One could try to kludge around this and make sage.symbolic.function.Function.__init__ delay the function registration with pynac until pynac is ready, or initialize pynac explicitly. But then somebody will find a way to not only initialize a pynac function, but also use it inside a module initalizer in a nontrivial way, and it would crash again.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:18

Replying to @vbraun:

Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so.

If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.

If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.

jhpalmieri commented 13 years ago
comment:19

I wonder if the patch at #11043 might help.

kiwifb commented 13 years ago
comment:20

Replying to @jhpalmieri:

I wonder if the patch at #11043 might help.

I cannot hurt to try it. It is hard to know how far these imports are reaching.

vbraun commented 13 years ago
comment:21

Replying to @sagetrac-drkirkby:

If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.

I think thats unrelated and essentially due to harddrives or NFS. The CPU can still run circles around any filesystem access.

If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.

Well compared to the C++ static initializer hell this is a piece of cake ;-). We have the tools to easily make initializers lazy, we just need to use them.

11043 doesn't touch symbolic stuff so I doubt it'll do anything.

mkoeppe commented 4 years ago
comment:26

Outdated, should be closed

dimpase commented 4 years ago

Reviewer: Dima Pasechnik