sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.31k stars 449 forks source link

Refcounting for Singular rings #11339

Closed 8d15854a-f726-4f6b-88e7-82ec1970fbba closed 12 years ago

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago

Ref counting for Singular rings

Python makes no guarantees that destructors are called if circular references are involved. This patch implements an extra Python proxy layer that correctly refcounts Singular rings. In contrast to our earlier code, it will release the singular rings once they are no longer in use. This triggers various issues in Sage/libSingular that are dealt with in the follow-up ticket #10903

Historic discussion

As described for the Sage on Gentoo project, there is a problem in how parts of sage use __deallocate__ in their Cython code. I've actually traced an instance of groebner_strategy.pyx causing segmentation faults under Python 2.7.

What happens is that the garbage collector invokes the tp_clear function for the object. Its implementation is provided by Cython, and one of its effects is that every python reference will be set to None. A bit later on, tp_dealloc is called and invokes the __deallocate__ method. By that time, _parent is None, so accessing _parent._ring is a bad idea, and in this case it turns out to be null.

Others have had similar problems before. There are crude workarounds floating around. I guess a proper solution would be twaking cython to allow custom code in the tp_clear function. In other words, have a "magic" mehod __clear__ similar to the magic __deallocate__. But I'll wait for comments here first, before taking this to cython upstream. Perhaps people with more experience have better solutions to offer. And I won't mind if you decide to take this to Cython devs yourselves.

Apply

Apply:

Depends on #10903

Dependencies: #10903 (for doctests)

CC: @kiwifb @robertwb @burcin @malb @simon-king-jena @nexttime

Component: algebra

Keywords: sd31

Author: Volker Braun, Martin von Gagern

Reviewer: François Bissey, Steven Trogdon, Martin Albrecht

Merged: sage-4.7.2.alpha4

Issue created by migration from https://trac.sagemath.org/ticket/11339

vbraun commented 13 years ago
comment:2

The documentation is fairly clear, you are only allowed to touch C data structures in __dealloc__: http://docs.cython.org/src/userguide/special_methods.html#finalization-method-dealloc

Really, this shows the difference between the C++ and the Python object model. In Python, executing constructors/destructors in derived classes is optional and leaves a lot of freedom to the programmer. Because of the garbage collection you can allocate resources wherever you want. By comparison, C++ is very strict and essentially forces you to use RAII. This is why there is no try/finally in C++. Which is why there is necessarily some tension in Cython between C++ and Python object instantiation/finalization.

Having said that, it seems like the cleanest solution would be if Cython would also call the Python destructor __del__ according to the normal Python rules (why __clear__?). Then the Python and C++ object model could just peacefully coexist. The lifetime of a cdef class would be

__cinit__()    # always called
__init__()     # not necessarily called for derived classes
... do something ...
__del__()      # not necessarily called for derived classes
__dealloc__()  # always called
kiwifb commented 13 years ago
comment:3

Thanks for your input Volker. I think we should definitely put some work in libs/singular to have some __del__ and __dealloc__ functions. I think the current state of things in there is a bit messy. I remember reading about __del__ in the python doc about garbage collection and wondering why singular_ring_delete in libs/singular/rings.pyx wasn't a __del__ method instead as its intent is clear.

vbraun commented 13 years ago
comment:4

Just to clarify, right now Cython extension classes do not call the Python destructor __del__ upon finalization, so Sage can't rely on this mechanism. I'm not quite sure if it is an intentional thing or if the Cython devs just haven't gotten around to implementing it. Maybe we can hack on that on Sage Days 31 since both Robert and Burcin will be around :-)

kiwifb commented 13 years ago
comment:5

OK, well something as to be done about this if we want to move to python-2.7. I think this problem is probably the biggest issue in the way of #9958 there are a few other things to look at but that's the only one producing a segfault.

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:6

I notice that PyTypeObject has a member tp_del which is not included in the documentation. So perhaps it's as simple as associating a function with that slot. Otoh, perhaps there is a reason that member isn't documented. The single reply to Python issue 4934 doesn't rule out that possibility.

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago

Workaround patch

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago

Attachment: bug11339a.patch.gz

Workaround hg bundle

kiwifb commented 13 years ago
comment:8

Attachment: bug11339a.bundle.gz

For the record this works with sage-on-gentoo with python-2.7.1 so as far as I am concerned it works as advertised so far. I can do a review against a vanilla sage using python-2.6 later this week provided that I get the all clear to go back to work (we got a couple more earthquakes down here - the magnitude 6 one was scary but very few people got hurt).

What is the "workaround hg bundle" Martin? I can reformat things if needed.

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:9

Replying to @kiwifb:

What is the "workaround hg bundle" Martin? I can reformat things if needed.

It's what I perceived to be the hg equivalent of a "bzr send": a file which describes a number of hg commits inclusing metadata, so that other hg users can pull from it. See hg manual for unbundle and related documentation.

kiwifb commented 13 years ago
comment:10

Hi Martin, Steve sent me a note about it earlier this morning. The patch is the only thing necessary here, am I right?

strogdon commented 13 years ago
comment:11

Replying to @kiwifb:

For the record this works with sage-on-gentoo with python-2.7.1 so as far as I am concerned it works as advertised so far. I can do a review against a vanilla sage using python-2.6 later this week provided that I get the all clear to go back to work (we got a couple more earthquakes down here - the magnitude 6 one was scary but very few people got hurt). What is the "workaround hg bundle" Martin? I can reformat things if needed.

This patch works fine for me too on sage-on-gentoo with python-2.7.1. I have applied this patch to the vanilla sage-4.7.1.alpha2 (python-2.6.4.p10) spkg and all long tests pass. I have also applied the patch along with the patches and required spkgs in ticket #9958 to the sage-4.7.1.alpha2 (python-2.7.1) spkg and all the tests that failed without this patch because of segfaults, i.e.

sage -t  -force_lib "devel/sage/sage/schemes/generic/scheme.py"

sage -t  -force_lib "devel/sage/sage/rings/morphism.pyx"    

sage -t  -force_lib "devel/sage/sage/rings/homset.py"

now pass.

kiwifb commented 13 years ago

Description changed:

--- 
+++ 
@@ -3,3 +3,6 @@
 What happens is that the garbage collector invokes the [tp_clear](http://docs.python.org/c-api/typeobj.html#tp_clear) function for the object. Its implementation is provided by Cython, and one of its effects is that every python reference will be set to `None`. A bit later on, [tp_dealloc](http://docs.python.org/c-api/typeobj.html#tp_dealloc) is called and invokes the `__deallocate__` method. By that time, `_parent` is `None`, so accessing `_parent._ring` is a bad idea, and in this case it turns out to be null.

 Others have had similar problems before. There are [crude workarounds](http://stackoverflow.com/questions/4501938/how-can-c-object-lifetimes-be-correctly-managed-in-cython) floating around. I guess a proper solution would be twaking cython to allow custom code in the tp_clear function. In other words, have a "magic" mehod `__clear__` similar to the magic `__deallocate__`. But I'll wait for comments here first, before taking this to cython upstream. Perhaps people with more experience have better solutions to offer. And I won't mind if you decide to take this to Cython devs yourselves.
+
+Apply:
+* [attachment: bug11339a.patch](https://github.com/sagemath/sage-prod/files/10652862/bug11339a.patch.gz)
kiwifb commented 13 years ago

Reviewer: François Bissey, Steven Trogdon

kiwifb commented 13 years ago
comment:12

I am giving this a positive review on my and Steve's observations. I unfortunately won't be able to run some extra tests before Monday it seems.

vbraun commented 13 years ago
comment:13

As it is, the patch is spectacularly ugly. I agree that it works, but we should investigate alternatives. It is not a particularly pressing issue since Sage is not going to switch to Python-2.7 this week. I'll have a closer look at it with Burcin duing this week at SD31.

kiwifb commented 13 years ago
comment:14

I am ok with this. I don't plan to push python-2.7.1 in 4.7.1.

I would like most of the other pending bits and pieces that will make testing easier to be in 4.7.1 however. It would be nice if at least some testing could be done in 4.7.2 and really land it by the next release.

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:15

Replying to @vbraun:

As it is, the patch is spectacularly ugly. [...] we should investigate alternatives.

I agree, that's why I titled the patch "workaround" instead of "solution". I can't think of a way to address this in sage alone, though, as in my opinion a proper solution would entail modifications to cython as discussed above.

vbraun commented 13 years ago

Description changed:

--- 
+++ 
@@ -5,4 +5,4 @@
 Others have had similar problems before. There are [crude workarounds](http://stackoverflow.com/questions/4501938/how-can-c-object-lifetimes-be-correctly-managed-in-cython) floating around. I guess a proper solution would be twaking cython to allow custom code in the tp_clear function. In other words, have a "magic" mehod `__clear__` similar to the magic `__deallocate__`. But I'll wait for comments here first, before taking this to cython upstream. Perhaps people with more experience have better solutions to offer. And I won't mind if you decide to take this to Cython devs yourselves.

 Apply:
-* [attachment: bug11339a.patch](https://github.com/sagemath/sage-prod/files/10652862/bug11339a.patch.gz)
+* [attachment: trac_11339_illegal_use_of__deallocate__in_libsingular.patch](https://github.com/sagemath/sage-prod/files/10652864/trac_11339_illegal_use_of__deallocate__in_libsingular.patch.gz)
vbraun commented 13 years ago
comment:16

In the case at hand we don't really need the Cython class GroebnerStrategy._parent, only the C struct GroebnerStrategy._parent._ring is used in __deallocate__(). I've made a minimal patch that adds a cdef ring *_parent_ring data member to GroebnerStrategy to keep track of the ring only. Because it refers to a C struct, it is safe to access it the Cython destructor. Because parents are immutable we don't have to worry about the ring * pointer becoming invalid.

The patch should fix the crash, but I don't have a Sage-on-Python-2.7 install to test it. Can somebody give it a try?

vbraun commented 13 years ago

Author: Volker Braun, Martin von Gagern

vbraun commented 13 years ago

Changed keywords from none to sd31

vbraun commented 13 years ago

Initial patch

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:17

Attachment: trac_11339_illegal_use_ofdeallocatein_libsingular.patch.gz

Replying to @vbraun:

I had considered keeping only that C pointer, but decided against it.

Because parents are immutable we don't have to worry about the ring * pointer becoming invalid.

Can you explain this in more detail? Why does the parent object being immutable help in keeping the pointer valid? Do you mean that the parent cython object doesn't change it's ring member? That is correct, but not enough to make the approach safe: if you don't keep a reference to the parent python object around, python might decide to garbage-collect that first, leading to a call to the singula function rDelete. I guess that function at least frees the memory, so if you are unlucky, some thread will overwrite the memory that ring points to with completely different data. Of course, in most cases you'd be lucky, so you wouldn't notice the problem. But unless singular does some reference counting of its own in such a way that a rDelete call on the parent won't have immediate effect, your approach isn't safe.

strogdon commented 13 years ago
comment:18

Replying to @vbraun:

The patch should fix the crash, but I don't have a Sage-on-Python-2.7 install to test it. Can somebody give it a try?

Here, the new patch doesn't solve the segfault problems when applied to sage-on-gentoo. In testing vanilla Sage with python-2.7.1 I've been patching the Sage spkg and then building. So my sage-main isn't a pristine sage-4.7.1.alpha2 with python-2.7.1. However the last patch that I've applied is the bug11339a patch. So, I cloned Sage to the patchset just prior to this last patch, apply the new patch and issued 'sage -ba'. The 3 tests listed #comment:11 all segfault. Let me know if you think my results are "overly" tainted.

kiwifb commented 13 years ago
comment:19

Replying to @vbraun:

In the case at hand we don't really need the Cython class GroebnerStrategy._parent, only the C struct GroebnerStrategy._parent._ring is used in __deallocate__(). I've made a minimal patch that adds a cdef ring *_parent_ring data member to GroebnerStrategy to keep track of the ring only. Because it refers to a C struct, it is safe to access it the Cython destructor. Because parents are immutable we don't have to worry about the ring * pointer becoming invalid.

The patch should fix the crash, but I don't have a Sage-on-Python-2.7 install to test it. Can somebody give it a try?

Does it depend on other patches that are post 4.7.1.alpha2? The feedback so far is that it doesn't work but I noticed there was some work in singular and rings in alpha3 and alpha4. Would it rely on any of these fixes?

kiwifb commented 13 years ago
comment:20

hum, after looking at the tickets there are not so many that could have an influence (#11218 and #11186 and they would be long shot). Must have been stuff that I have seen on trac but not yet merged.

vbraun commented 13 years ago
comment:21

By immutable I meant that the ring of the Parent object can't change. But I wasn't aware that the parent is part of the cycle that the garbage collector is trying to break, then it won't work of course.

Actually, if the cycle is Parent <-> GroebnerStrategy then I don't understand Martin's orginal patch. By manually increasing the refcount of Parent by one, the garbage collector should conclude that an external reference is keeping the cycle alive, right? Hence __dealloc__ would never be run. This fixes the crash, of course, but at the cost of leaking memory?

I'm leaning towards refcounting the libsingular rings in C. This would be rather easy to implement, I think.

Also, Burcin told me that the need to explicitly reset the ring after the groebner strategy is going to go away in the future. But I would wager that Sage is going to transition to Python 3 first ;-)

8d15854a-f726-4f6b-88e7-82ec1970fbba commented 13 years ago
comment:22

Replying to @vbraun:

But I wasn't aware that the parent is part of the cycle that the garbage collector is trying to break, then it won't work of course.

Actually, if the cycle is Parent <-> GroebnerStrategy then I don't understand Martin's orginal patch.

I'm not sure there really is a cycle. I had interpreted the situation in such a way that tp_clear will be called for unreachable objects just in case there might be some kind of cycle between them. So there might be no actual cycle involved at all, or it might be with some other member object. One requirement is that the function does cut all cycles, which should work as the parent shouldn't be referring back to the Gröbner strategy. So by keeping the reference in one direction, we satisfy tp_clear and still manage to finalize things. As a bonus, we ensure their finalization in the correct order.

But I might be completely off target with my above understanding. Someone with intricate knowledge of python, its evolution, as well as cython, might know better.

Until then, someone could add breakpoints to see if the finalizer is executed with my patch in place. Don't have the time just now, but might do it myself in a week or so if you remind me.

vbraun commented 13 years ago
comment:23

The Python docs say "The tp_clear member function is used to break reference cycles in cyclic garbage detected by the garbage collector.", so there must have been a cycle involving the GroebnerStrategy instance.

I looked a bit at the source and we definitely have cycles ideal<->GroebnerStrategy. Both refer to the parent ring, and this explains why tp_clear was called on GroebnerStrategy. I haven't found a place where the parent refers back to the cycle. In this situation Martin's patch should work as we can get rid of the cycle while the parent has positive refcount. It could be that Python 2.7 got smarter and deletes the parent together with the cycle while Python 2.6 first deleted the cycle and only then noticed that nothing else refers to the parent any more. This would explain why Martin's patch works but my attempt of blindly accessing the ring C structure fails on Python 2.7.

There is a lot of stuff that could conceivably create a larger cycle involving the parent even though I haven't found one. For one, the parent keeps a reference to the "one" element. Even then, I think it is too dangerous to make the hidden assumption that there are no complete cycles involving the parents. Somebody is bound to trip over this sooner or later.

vbraun commented 13 years ago
comment:24

We got this crash in the parent destructor, but look whats in the element destructor:

    def __dealloc__(self):
        # TODO: Warn otherwise!
        # for some mysterious reason, various things may be NULL in some cases
        if self._parent is not <ParentWithBase>None and (<MPolynomialRing_libsingular>self._parent)._ring != NULL and self._poly != NULL:
            p_Delete(&self._poly, (<MPolynomialRing_libsingular>self._parent)._ring)
vbraun commented 13 years ago

Description changed:

--- 
+++ 
@@ -5,4 +5,6 @@
 Others have had similar problems before. There are [crude workarounds](http://stackoverflow.com/questions/4501938/how-can-c-object-lifetimes-be-correctly-managed-in-cython) floating around. I guess a proper solution would be twaking cython to allow custom code in the tp_clear function. In other words, have a "magic" mehod `__clear__` similar to the magic `__deallocate__`. But I'll wait for comments here first, before taking this to cython upstream. Perhaps people with more experience have better solutions to offer. And I won't mind if you decide to take this to Cython devs yourselves.

 Apply:
-* [attachment: trac_11339_illegal_use_of__deallocate__in_libsingular.patch](https://github.com/sagemath/sage-prod/files/10652864/trac_11339_illegal_use_of__deallocate__in_libsingular.patch.gz)
+* [attachment: trac_11339_refcount_singular_rings.patch](https://github.com/sagemath/sage-prod/files/10652868/trac_11339_refcount_singular_rings.patch.gz)
+* [attachment: trac_11339_refcount_singular_polynomials.patch](https://github.com/sagemath/sage-prod/files/10652869/trac_11339_refcount_singular_polynomials.patch.gz)
+
vbraun commented 13 years ago
comment:25

Ok this should fix it now for good. The first patch introduces refcounting for Singular ring pointers and uses it in GroebnerStrategy, and the second updates MPolynomialRing_libsingular and MPolynomial_libsingular. Please give it a try with Python-2.7...

kiwifb commented 13 years ago
comment:26

I have boldly pushed it to the sage-on-gentoo tree and can report that it works fine with python-2.7.1 there on 4.7.1.alpha2. It solved the problem we had but another doctest crashed that looked completely unrelated until I looked at the backtrace:

sage -t -long  -force_lib devel/sage-main/sage/crypto/mq/sr.py # Killed/crashed

and here is it says when I add -verbose

Trying:
    from sage.crypto.mq.sr import test_consistency###line 3336:_sage_    >>> from sage.crypto.mq.sr import test_consistency
Expecting nothing
ok
Trying:
    test_consistency(Integer(1))  # long time (80s on sage.math, 2011)###line 3337:_sage_    >>> test_consistency(1)  # long time (80s on sage.math, 2011)
Expecting:
    True
/usr/lib64/libcsage.so(print_backtrace+0x24)[0x7f5259023554]
/usr/lib64/libcsage.so(sigdie+0x1d)[0x7f52590235ed]
/usr/lib64/libcsage.so(sage_signal_handler+0x131)[0x7f5259023761]
/lib64/libpthread.so.0(+0xfee0)[0x7f525ca80ee0]
/usr/lib64/libsingular.so.3(_Z7na_CopyP7snumberP9sip_sring+0x79)[0x7f523cc17d47]
/usr/lib64/libsingular.so.3(_Z6pSubstP8spolyreciS0_+0x535)[0x7f523cc41734]
/usr/lib64/python2.7/site-packages/sage/rings/polynomial/multi_polynomial_libsingular.so(+0x548e9)[0x7f523c5de8e9]
/usr/lib64/libpython2.7.so.1.0(PyCFunction_Call+0x76)[0x7f525cd0e1ab]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x53b6)[0x7f525cd68414]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x3b)[0x7f525cd69c23]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x2597)[0x7f525cd655f5]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(+0x6ca97)[0x7f525ccfba97]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(+0x552a9)[0x7f525cce42a9]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4ea0)[0x7f525cd67efe]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(+0x6ca97)[0x7f525ccfba97]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(+0x552a9)[0x7f525cce42a9]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4ea0)[0x7f525cd67efe]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(+0x6ca97)[0x7f525ccfba97]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(+0x552a9)[0x7f525cce42a9]
/usr/lib64/libpython2.7.so.1.0(PyObject_Call+0x75)[0x7f525ccd9539]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4ea0)[0x7f525cd67efe]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4d3a)[0x7f525cd67d98]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x85f)[0x7f525cd69b65]
/usr/lib64/libpython2.7.so.1.0(PyEval_EvalCode+0x3b)[0x7f525cd69c23]
/usr/lib64/libpython2.7.so.1.0(+0xf3a14)[0x7f525cd82a14]
/usr/lib64/libpython2.7.so.1.0(PyRun_FileExFlags+0xb1)[0x7f525cd83719]
/usr/lib64/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x33c)[0x7f525cd8421a]
/usr/lib64/libpython2.7.so.1.0(PyRun_AnyFileExFlags+0x6e)[0x7f525cd84ac1]
/usr/lib64/libpython2.7.so.1.0(Py_Main+0xb92)[0x7f525cd9424a]
/usr/bin/python2.7(main+0x7f)[0x4009e3]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f525c70acdd]
/usr/bin/python2.7[0x4008a9]

------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------

It should be double checked with a vanilla install in case it just reveals a problem with my libsingular ebuild.

vbraun commented 13 years ago
comment:27

I had similar problems, but I thought I fixed them and it did pass tests on my machine. In my experience, they always go away if we leak the ring (which is essentially what we did before) instead of cleaning up after we are done with it. I have a suspicion that, although almost all take a ring as argument, some Singular functions require currRing to be set. This is why I added the inlined _check_ring() function to each MPolynomial_libsingular method to make sure that we have the right currRing.

I've added a superfluous p_Normalize call to MPolynomial_libsingular.__dealloc__, this is likely the point where you are crashing. You'll get a more useful backtrace if you paste the offending doctest into sage -gdb and then use "bt".

Maybe somebody who is more familiar with the Singular api can chime in and explain precisely where you have to call rChangeCurrRing and where you don't need to.

kiwifb commented 13 years ago
comment:28

It is one of these annoying one which doesn't seem to happen when you run under gdb. May be I'll have to try a bigger portion of the sequence of commands tested.

strogdon commented 13 years ago
comment:29

Replying to @vbraun:

Ok this should fix it now for good. The first patch introduces refcounting for Singular ring pointers and uses it in GroebnerStrategy, and the second updates MPolynomialRing_libsingular and MPolynomial_libsingular. Please give it a try with Python-2.7...

OK, I applied both the singular_rings and singular_polynomials patches to the 4.7.1.alpha2 that I have and built Sage as in comment:18 . As noted above this is not a pristine 4.7.1.alpha2, but has other python-2.7 related patches included; tickets #11377, #11244, #9958, #11376 and #10764. There are no segfaults. The testsuite looks like the results when the bug11339a.patch was applied. And I don't get the Killed doctest mentioned by François. Now with just the singular_rings patch I also get no segfaults or crashes. Should I get a crash? I can't see that any of the applied patches could affect things. I will build from scratch, just to make sure.

kiwifb commented 13 years ago
comment:30

OK so a proper backtrace. Note that the failing test is marked "long".

Program received signal SIGSEGV, Segmentation fault.
0x00007fffd79c2d47 in p_Copy (p=0x1, r=0x7ffff445fc38) at pInline2.h:441
441     pInline2.h: No such file or directory.
        in pInline2.h
(gdb) bt
#0  0x00007fffd79c2d47 in p_Copy (p=0x1, r=0x7ffff445fc38) at pInline2.h:441
#1  na_Copy (p=0x1, r=0x7ffff445fc38) at longalg.cc:1005
#2  0x00007fffd79ec734 in p_Head (p=<value optimized out>, n=<value optimized out>, e=0x7fffd0d832d0) at pInline1.h:152
#3  pSubst (p=<value optimized out>, n=<value optimized out>, e=0x7fffd0d832d0) at polys.cc:970
#4  0x00007fffd7387998 in __pyx_pf_4sage_5rings_10polynomial_28multi_polynomial_libsingular_23MPolynomial_libsingular_33subs (
    __pyx_v_self=0x4d1b370, __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>)
    at sage/rings/polynomial/multi_polynomial_libsingular.cpp:19922
#5  0x00007ffff7aac1ab in PyCFunction_Call (func=0x4ce4908, arg=0x4ea41d0, kw=<value optimized out>) at Objects/methodobject.c:85
#6  0x00007ffff7b06414 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4322
#7  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2704
#8  0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x1798b30, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=2, kws=0x5244750, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#9  0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#10 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#11 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#12 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x3cc92b0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=1, kws=0x523abe8, kwcount=0, defs=0x3ccb6a8, defcount=1, closure=0x0)
    at Python/ceval.c:3252
#13 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#14 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#15 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#16 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4e985b0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#17 0x00007ffff7b07c23 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>)
    at Python/ceval.c:666
#18 0x00007ffff7b035f5 in exec_statement (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4709
#19 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:1879
#20 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4bddc30, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=5, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#21 0x00007ffff7a99a97 in function_call (func=0x4bafc08, arg=0x552fdd0, kw=0x0) at Objects/funcobject.c:526
#22 0x00007ffff7a77539 in PyObject_Call (func=0x4bafc08, arg=0x552fdd0, kw=0x0) at Objects/abstract.c:2529
#23 0x00007ffff7a822a9 in instancemethod_call (func=0x4bafc08, arg=0x552fdd0, kw=0x0) at Objects/classobject.c:2578
#24 0x00007ffff7a77539 in PyObject_Call (func=0x5450410, arg=0x552fdd0, kw=0x0) at Objects/abstract.c:2529
#25 0x00007ffff7b05efe in do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4230
#26 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4035
#27 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#28 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4c38c30, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=5, kws=0x52444f8, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#29 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#30 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#31 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#32 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4bddd30, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=4, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#33 0x00007ffff7a99a97 in function_call (func=0x4bafc80, arg=0x5522050, kw=0x0) at Objects/funcobject.c:526
#34 0x00007ffff7a77539 in PyObject_Call (func=0x4bafc80, arg=0x5522050, kw=0x0) at Objects/abstract.c:2529
#35 0x00007ffff7a822a9 in instancemethod_call (func=0x4bafc80, arg=0x5522050, kw=0x0) at Objects/classobject.c:2578
#36 0x00007ffff7a77539 in PyObject_Call (func=0x5453500, arg=0x5522050, kw=0x0) at Objects/abstract.c:2529
#37 0x00007ffff7b05efe in do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4230
#38 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4035
#39 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#40 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x7ffff289de30, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=4, kws=0x5181330, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#41 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#42 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#43 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#44 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4bddeb0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=2, kws=0x4c3b618, kwcount=3, defs=0x4bae108, defcount=3, closure=0x0)
    at Python/ceval.c:3252
#45 0x00007ffff7a99a97 in function_call (func=0x4bafde8, arg=0x4b2e680, kw=0x473aa50) at Objects/funcobject.c:526
#46 0x00007ffff7a77539 in PyObject_Call (func=0x4bafde8, arg=0x4b2e680, kw=0x473aa50) at Objects/abstract.c:2529
#47 0x00007ffff7a822a9 in instancemethod_call (func=0x4bafde8, arg=0x4b2e680, kw=0x473aa50) at Objects/classobject.c:2578
#48 0x00007ffff7a77539 in PyObject_Call (func=0x37d84b0, arg=0x4b2e680, kw=0x473aa50) at Objects/abstract.c:2529
#49 0x00007ffff7b05efe in do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4230
#50 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4035
#51 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#52 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x7ffff7ec41b0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=2, kws=0x4c68d60, kwcount=0, defs=0x49f56a8, defcount=3, closure=0x0)
    at Python/ceval.c:3252
#53 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#54 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#55 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#56 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4bd88b0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=0, kws=0x46212b0, kwcount=10, defs=0x4bb8288, defcount=10, closure=0x0)
    at Python/ceval.c:3252
#57 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#58 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#59 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#60 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x4c38ab0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=1, kws=0x6cae88, kwcount=3, defs=0x4bb8068, defcount=10, closure=0x0)
    at Python/ceval.c:3252
#61 0x00007ffff7b05d98 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4108
#62 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#63 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2665
#64 0x00007ffff7b07b65 in PyEval_EvalCodeEx (co=0x7ffff7eb7eb0, globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#65 0x00007ffff7b07c23 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>)
    at Python/ceval.c:666
#66 0x00007ffff7b20a14 in run_mod (mod=<value optimized out>, filename=<value optimized out>, globals=0x640f60, locals=0x640f60, 
    flags=<value optimized out>, arena=<value optimized out>) at Python/pythonrun.c:1346
#67 0x00007ffff7b21719 in PyRun_FileExFlags (fp=0x6a0530, filename=0x7fffffffddc1 "/home/fbissey/.sage/tmp/.doctest_sr.py", 
    start=257, globals=0x640f60, locals=0x640f60, closeit=1, flags=0x7fffffffd870) at Python/pythonrun.c:1332
#68 0x00007ffff7b2221a in PyRun_SimpleFileExFlags (fp=0x6a0530, filename=<value optimized out>, closeit=1, flags=0x7fffffffd870)
    at Python/pythonrun.c:936
#69 0x00007ffff7b22ac1 in PyRun_AnyFileExFlags (fp=0x6a0530, filename=0x7fffffffddc1 "/home/fbissey/.sage/tmp/.doctest_sr.py", 
    closeit=1, flags=0x7fffffffd870) at Python/pythonrun.c:740
#70 0x00007ffff7b3224a in Py_Main (argc=<value optimized out>, argv=0x7fffffffd9b8) at Modules/main.c:606
#71 0x00000000004009e3 in main (argc=2, argv=0x7fffffffd9b8) at ./Modules/python.c:46

I didn't realize you could do something like

sage -t -long  -force_lib devel/sage-main/sage/crypto/mq/sr.py -gdb

handy...

kiwifb commented 13 years ago
comment:31

Since we can even add "-verbose" I should add this bit of output:

Trying:
    test_consistency(Integer(1))  # long time (80s on sage.math, 2011)###line 3337:_sage_    >>> test_consistency(1)  # long time (80s on sage.math, 2011)
Expecting:
    True

Program received signal SIGSEGV, Segmentation fault.
0x00007fffd79c2d47 in p_Copy (p=0x1, r=0x7ffff445fc38) at pInline2.h:441
441     pInline2.h: No such file or directory.
        in pInline2.h

So it looks like the segfault is after the test has completed, probably when sage is quitting as happened to Steve when he tried to run the test while disabling the garbage collection (before we had any patches what so ever).

strogdon commented 13 years ago
comment:32

I've just completed the testsuite after building vanilla sage-4.7.1.alpha2 from scratch. The patches from tickets #11377, #11244, #9958, #11376 and #10764 along with the patches of this ticket were used to create a patched sage-4.7.1.alpha2 spkg before the build. The results are identical to those reported earlier. There are no segfaults and no Killed/crashed tests.

On my sage-on-gentoo install on x86 in prefix there no segfaults; however, I have one Killed/crashed doctest which I didn't have when using the bug11339a patch.

sage -t -long -force_lib "devel/sage-main/sage/libs/singular/option.pyx" -gdb


Program received signal SIGSEGV, Segmentation fault.
0xb3b16d29 in _nlDelete_NoImm(snumber**) () from /storage/strogdon/gentoo/usr/lib/libsingular.so.3
(gdb) bt
#0  0xb3b16d29 in _nlDelete_NoImm(snumber**) () from /storage/strogdon/gentoo/usr/lib/libsingular.so.3
#1  0xb3bd831d in p_Delete__FieldQ_LengthGeneral_OrdGeneral () from /storage/strogdon/gentoo/usr/lib/libsingular.so.3
#2  0xb3acde01 in id_Delete(sip_sideal**, sip_sring*) () from /storage/strogdon/gentoo/usr/lib/libsingular.so.3
#3  0xb3a70736 in sleftv::CleanUp(sip_sring*) () from /storage/strogdon/gentoo/usr/lib/libsingular.so.3
#4  0xb37f490e in __pyx_f_4sage_4libs_8singular_8function_free_leftv(sleftv*) () from /storage/strogdon/gentoo/usr/lib/python2.7/site-packages/sage/libs/singular/function.so
#5  0xb3805653 in __pyx_f_4sage_4libs_8singular_8function_call_function(__pyx_obj_4sage_4libs_8singular_8function_SingularFunction*, _object*, __pyx_obj_4sage_5rings_10polynomial_28multi_polynomial_libsingular_MPolynomialRing_libsingular*, __pyx_opt_args_4sage_4libs_8singular_8function_call_function*) () from /storage/strogdon/gentoo/usr/lib/python2.7/site-packages/sage/libs/singular/function.so
#6  0xb3806843 in __pyx_pf_4sage_4libs_8singular_8function_16SingularFunction_2__call__(_object*, _object*, _object*) () from /storage/strogdon/gentoo/usr/lib/python2.7/site-packages/sage/libs/singular/function.so
#7  0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#8  0xb7f3a0a4 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#9  0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#10 0xb7f3ca34 in PyEval_EvalCode () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#11 0xb7f3bbb2 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#12 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#13 0xb7ec7c58 in function_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#14 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#15 0xb7eafaae in instancemethod_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#16 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#17 0xb7f3a0a4 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#18 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#19 0xb7f3ab11 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#20 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#21 0xb7ec7c58 in function_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#22 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#23 0xb7eafaae in instancemethod_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#24 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#25 0xb7f3a0a4 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#26 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#27 0xb7f3ab11 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#28 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#29 0xb7ec7d43 in function_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#30 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#31 0xb7eafaae in instancemethod_call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#32 0xb7e9fffb in PyObject_Call () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#33 0xb7f3a0a4 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#34 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#35 0xb7f3ab11 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#36 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#37 0xb7f3ab11 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#38 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#39 0xb7f3ab11 in PyEval_EvalFrameEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#40 0xb7f3c8e1 in PyEval_EvalCodeEx () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#41 0xb7f3ca34 in PyEval_EvalCode () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#42 0xb7f5659c in run_mod () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#43 0xb7f57483 in PyRun_FileExFlags () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#44 0xb7f5815c in PyRun_SimpleFileExFlags () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#45 0xb7f58cf2 in PyRun_AnyFileExFlags () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#46 0xb7f6a35f in Py_Main () from /storage/strogdon/gentoo/usr/lib/libpython2.7.so.1.0
#47 0x0804888b in main ()
malb commented 13 years ago
comment:33

Replying to @vbraun:

Hi, first: thanks for cleaning up the mess that no-doubt I made.

I had similar problems, but I thought I fixed them and it did pass tests on my machine. In my experience, they always go away if we leak the ring (which is essentially what we did before) instead of cleaning up after we are done with it.

Ah, this might also explain #5949 then.

I have a suspicion that, although almost all take a ring as argument, some Singular functions require currRing to be set.

The API is as follows: pXXX does assume the currRing to be set, whereas p_XXX does not. However, sometimes there are bugs, i.e. you call p_XXX and it crashes unless currRing is set.

This is why I added the inlined _check_ring() function to each MPolynomial_libsingular method to make sure that we have the right currRing.

TBH, I don't like this blanket approach, did you run into specific problems? If so, I'd add a check to those functions instead of hiding bugs by forcing the ring.

vbraun commented 13 years ago
comment:34

Replying to @malb:

The API is as follows: pXXX does assume the currRing to be set, whereas p_XXX does not. However, sometimes there are bugs, i.e. you call p_XXX and it crashes unless currRing is set.

I guessed that much, but a documented list of exceptions to that rule might be nice :-)

TBH, I don't like this blanket approach, did you run into specific problems? If so, I'd add a check to those functions instead of hiding bugs by forcing the ring.

In principle I agree, but its probably more productive to first produce a working version and then see if it still works if we remove the blanket rChangeCurrRing. Right now I'm not sure if that is the underlying problem. I didn't find any 100% reliable testcase but in some cases adding the extra ring switch fixed segfaults, but only to reappear in other places.

vbraun commented 13 years ago
comment:35

The updated polynomials patch fixes another place where potentially the wrong singular ring was used, which would explain the data corruption. Please give it a try!

I'm pretty sure that p_Normalize requires currRing to be set. In the polynomials destructor, I have

    ### If you suspect that the poly data was corrupted, then this is a good way to trigger a segfault:
    rChangeCurrRing(self._parent_ring)
    p_Normalize(self._poly, self._parent_ring)

This is useful for debugging because p_Normalize will touch all coefficients of the polynomial. The destructor is typically called long after the actual computations so it is very likely that the currRing has changed by then. If I reverse the order of the two lines, then I get segfaults all over the Sage library. However, because it relies on the finalization order, they are usually in varying places and often disappear if you just paste the doctest into Sage.

kiwifb commented 13 years ago
comment:36

It seems to have taken care of my problem. Thanks Volker. Now we should wait to see if it did something for Steve.

strogdon commented 13 years ago
comment:37

Replying to @vbraun:

The updated polynomials patch fixes another place where potentially the wrong singular ring was used, which would explain the data corruption. Please give it a try!

My vanilla sage-4.7.1.alpha2 has no segfaults or Killed/crashed tests with the new patch. The patch seems to cure the doctest failure (option.pyx) I had on my sage-on-gentoo install on x86. However on amd64 (sage-on-gentoo) I have a Segmentation fault with the doctest

sage -t -long  -force_lib devel/sage-main/sage/schemes/elliptic_curves/ell_curve_isogeny.py

I'll attach files of the Segmentation fault and debug/backtrace.

strogdon commented 13 years ago

Segmentation fault associated with

strogdon commented 13 years ago

Attachment: ell_curve_isogeny.segfault.bz2.gz

ell_curve_isogeny.py debug/backtrace

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:38

Attachment: ell_curve_isogeny.gdb.bz2.gz

vbraun commented 13 years ago
comment:39

I built a sage spkg with Singular's polynomials debugging turned on (PDEBUG=2) and now Singular spits out tons of memory allocation errors during Sage startup (turns out Sage does some high-degree polynomial computations during startup). Can somebody who understands Singular better comment on whether those are real errors?

http://www.stp.dias.ie/~vbraun/Sage/spkg/singularDEBUG-3-1-1-4.p9.spkg

Note that its not as easy as turning on SAGE_DEBUG; the Singular spkg-install does set a configure option in that case but its not doing anything. I also had to fix up an instance where an #ifdef PDEBUG replaced a libSingular function by a macro.

malb commented 13 years ago
comment:40

I sent and e-mail to [libsingular-devel], cf. http://groups.google.com/group/libsingular-devel/browse_thread/thread/71fcd01208f107ca

vbraun commented 13 years ago
comment:41

The omalloc error gets triggered by fraction field conversions. If you take out some stuff from ell_curve_isogeny.py then you can start up Sage just fine. I'll attach a patch for debugging purposes.

The omalloc errors appear with or without my patches on this ticket, by the way.