Closed bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 closed 4 years ago
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
Replying to @kiwifb:
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.
Dave
Replying to @sagetrac-drkirkby:
Replying to @kiwifb:
That's interesting you get this backtrace. That looks so familiar, I thought it wouldn't happen on a vanilla sage install. https://github.com/cschwan/sage-on-gentoo/issues#issue/40
Yes, it does look very similar. Robert Bradshaw had some ideas how to solve it, but it needs someone with more knowledge of the subject to sort this out.
Dave
More knowledge than me I mean - not more knowledge than Robert.
Dave
Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453 I don't think solaris uses glibc (but you can confirm that) so you getting it there is very interesting. There is a test script in the glibc bug report above to test for the problem - it would be interesting if you could run it. In sage-on-gentoo it started on amd64 and spread into x86 land later the exact mix triggering the problem is still unknown. But if Robert can find a pure python solution that would be a relief. We are now giving users instructions on patching their glibc which isn't nice.
Description changed:
---
+++
@@ -125,7 +125,7 @@
BuiltinFunction.__init__(self, "integrate", nargs=2)
-Burchin writes on sage-devel +Burcin writes on sage-devel
'' It seems that cones.py looks for posets.py, which needs the graphs
Replying to @kiwifb:
Well Martin who reported the backtrace on github (but it wasn't the first report of it) tracked it down to what he thinks is a bug in glibc http://sources.redhat.com/bugzilla/show_bug.cgi?id=12453
I'm suspicious of the fact this may be a bug in glibc, as I'm 99% sure GCC will use the Sun C library and not the GNU one. I think Burcin's diagnosis of the problem is more likely to be correct. I posted a comment to that effect on the Gentoo site.
There's a test program on the Redhat glibc site, but I can't get that to run. Probably a bashism that needs a newer version of bash than I have.
Dave
That's why the fact you get it is so interesting. Patching glibc solved everyone's problem in gentoo but it may be that this is just a work around. Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should.
Note it has been observed to happen after update that are seemingly unrelated to sage packages. Which suggests that there is indeed a complex mix leading to the failure. Fixing any one components of the mix is probably enough to prevent the problem altogether.
Just to confirm, creating a 64-bit hello world program with gcc and using 'ldd' to find what libraries are linked.
drkirkby@hawk:~$ gcc -m64 test.c
drkirkby@hawk:~$ ldd a.out
libc.so.1 => /lib/64/libc.so.1
libm.so.2 => /lib/64/libm.so.2
drkirkby@hawk:~$
so only the Sun libraries are being used.
I'm hoping Robert or Bucin can come up with a reliable fix for this at the Python level. At least I know that will solve the problem on OpenSolaris.
As with many things, if the behavior is not defined, one can get mysterious and not necessarily reproducible bugs. It sounds like there are a couple of things which are not defined fully.
Dave
Replying to @kiwifb:
the exact mix triggering the problem is still unknown.
Ingredients to the Linux/libc issue as far as I know them:
Replying to @kiwifb:
Or may be it is more complex than that: the problem will only happen if a combination of elements are not behaving like they should. Fixing any one components of the mix is probably enough to prevent the problem altogether.
I'd agree to both. So you can either fix the exception handling mechanism, or avoid the exception being thrown. Either one makes the problem vanish, although the other half of the problem might well resurface somewhere else later on.
Is there a chance of linking that OpenSolaris backtrace to actual code lines from the Sun C library, to see what's happening there?
Replying to @sagetrac-drkirkby:
Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?
Replying to @gagern:
Replying to @sagetrac-drkirkby:
Seems people using Python in combination with boost have encountered similar things before. Does their diagnosis or workaround apply to you in some way?
If I understand correctly, his problem was that that he had compiled boost & his application with a different compiler to the Sun-compiled Python. But that is not the case here, as Python, boost and the rest of Sage are all built with the same compiler - Sage is not using Sun's Python.
Robert Bradshaw had some suggestions about how to solve this, but I don't know sufficient Python to implement them myself.
There are some other issues remaining when I comment out the code that's causing the crash, but I'm not sure if those are in any way related to the fact I've comment out a section of code.
Dave
The issue from sage-on-gentoo seem to have disappeared on one of my machines. I am not completely sure if Gentoo included Martin's patch already or if pynac-0.2.3 shipped in sage-4.7.1_alpha4 is responsible. It is probably worth giving 4.7.1_alpha4 a dpin.
The solution to this would be to import one of the objects mentioned in the chain I described lazily:
It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...
This can be done by
Note that quite a while ago, I wrote a script (sage-test-import
in local/bin
) to test for these problems. It imports each module in the Sage library individually, and checks if we got any errors. It would be a significant achievement if we can get a release to pass this test. This would go a long way towards making Sage more modular.
I'll give the latest alpha a try if there's a chance the problem may have been fixed.
It would be really good to get this resolved, as basically I am having to give up an attempt at a 64-bit Solaris port due to this bug. I can't do anything until this is solved, and I don't have the knowledge to do it myself. Hence you may have noticed my absence on sage-devel. I really can't make any useful contribution to Sage until this issue is resolved.
I'll give the latest alpha a build 64-bit. If this can be resolved, then there's a good chance of completing a 64-bit Solaris port, but without it solved, the port is effectively stalled.
Dave
Attachment: trac_11116-fix_imports.patch.gz
attachment: trac_11116-fix_imports.patch is a first attempt to clear up the circular dependencies. However, it still doesn't fix this problem.
Whatever I do, it seems that the initialization for libpynac.so
is not run by the time modules in sage.functions
are loaded. Is there a trick to make sure the library is initialized sooner?
I added Volker to the CC list, since he mentioned exactly this problem while working on pynac at SD31. :)
Having spent the whole day yesterday worrying about import ordering, I must say that we have way too many circular imports. This is also an issue because we currently call Cython with --disable-function-redefinition
that changes the import ordering for cython files to an old and obsolete behavior. But Sage relies on it, otherwise many of its circular imports break.
It would be the wrong approach to require module X to load before module Y, this will just cause maintenance headaches down the road. Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so. I would deposit that
To my mind, the problem here in this ticket is that the sage.symbolic.integration.integral
module instantiates its IndefiniteIntegral
class,
indefinite_integral = IndefiniteIntegral()
which in turn calls into pynac to register itself. Really there is no reason for this to be immediate, and it opens a can of worms about initialization order.
One could try to kludge around this and make sage.symbolic.function.Function.__init__
delay the function registration with pynac until pynac is ready, or initialize pynac explicitly. But then somebody will find a way to not only initialize a pynac function, but also use it inside a module initalizer in a nontrivial way, and it would crash again.
Replying to @vbraun:
Really, the problem is that module initializers do too much too early. If you start up Sage under a debugger then there are lots of non-trivial computations done in module initializers. Do we really need to construct some degree-20 polynomials every time Sage starts up? I don't think so.
If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.
If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.
I wonder if the patch at #11043 might help.
Replying to @jhpalmieri:
I wonder if the patch at #11043 might help.
I cannot hurt to try it. It is hard to know how far these imports are reaching.
Replying to @sagetrac-drkirkby:
If this sort of stupidity is occurring, it is no wonder there are complaints of Sage starting slowly. For some people Sage is taking minutes to start.
I think thats unrelated and essentially due to harddrives or NFS. The CPU can still run circles around any filesystem access.
If I am understanding you correctly, it seems this problem I noticed on OpenSolaris is just a symptom of a more serious implementation issue, which is the result of a lack of thought in the design of Sage.
Well compared to the C++ static initializer hell this is a piece of cake ;-). We have the tools to easily make initializers lazy, we just need to use them.
Outdated, should be closed
Reviewer: Dima Pasechnik
Sage builds fully 64-bit on Solaris 10 (SPARC).
On 64-bit OpenSolaris or Solaris 10, the stats package R fails to build. Since R is an external program, one can just touch
SAGE_ROOT/spkg/installed/r-$versions
and get an almost complete Sage.However, this 64-bit Sage crashes at startup with OpenSolaris on x86, as discussed at:
http://groups.google.com/group/sage-devel/browse_thread/thread/efc864c79fed92df?hl=en
(one would expect similar on Solaris 10 x86 and probably SPARC too).
A backtrace with gdb on a Sun Ultra 27 running OpenSolaris 06/2009 shows:
Burcin Erocal produced this Python call stack.
Burcin writes on sage-devel
'' It seems that cones.py looks for posets.py, which needs the graphs module, which initializes the graph_editor. The graph editor tries to see if it's in the notebook or the command line, but sagenb imports SR and Expression from sage.symbolic.all (line 563 of sagenb/misc/support.py). This tries to initialize the functions (integrate in this case) before pynac is initialized...''
''We need a better solution for making sure modules are initialized properly before anything is imported from them. I thought putting an init.py file in sage/symbolic/ with "import pynac" would solve the problem. However, it seems that python just ignores that file. ''
This is one of the very few issues preventing a complete 64-bit build on Solaris/OpenSolaris, so it would be nice to crack this one.
CC: @burcin @jhpalmieri @robertwb @gagern @vbraun @dimpase
Component: porting: Solaris
Reviewer: Dima Pasechnik
Issue created by migration from https://trac.sagemath.org/ticket/11116