Open tornaria opened 9 months ago
my eyes bleed from this line:
rnd = [ randint(-10,10) for i in range(randint(0,7)) ]
How about
rnd = random.choices(range(-10,10), k=randint(0,7))
?
See https://docs.python.org/3/library/random.html#random.choices
maybe it segfaults on libgap.Sum(*rnd)
when rnd==[]
?
This also pops up on Fedora 39 with system Python 3.12, see e.g. https://groups.google.com/g/sage-release/c/oTnTrEnB3YU/m/tZiLm66ZAwAJ
But I can't reproduce this with Python 3.12.2
Is it possible that both @jaapspies and me are missing some python3 system package?
At some point this segfault occured only in tests, now it happens also in the command line
my eyes bleed from this line:
rnd = [ randint(-10,10) for i in range(randint(0,7)) ]
How about
rnd = random.choices(range(-10,10), k=randint(0,7))
?
See https://docs.python.org/3/library/random.html#random.choices
Tried that but it does not have an effect
sage: import random ## line 2490 ##
sage: for i in range(100):
rnd = random.choices(range(-10,10), k=randint(0,7))
# compute the sum in GAP
_ = libgap.Sum(rnd)
try:
libgap.Sum(*rnd)
print('This should have triggered a ValueError')
print('because Sum needs a list as argument')
except ValueError:
pass ## line 2491 ##
------------------------------------------------------------------------
/usr/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0xa0d4)[0x7fbe2865c0d4]
/usr/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0xa196)[0x7fbe2865c196]
/usr/lib/python3.12/site-packages/cysignals/signals.cpython-312-x86_64-linux-gnu.so(+0xcbf6)[0x7fbe2865ebf6]
/usr/lib64/libc.so.6(+0x3a210)[0x7fbe2945f210]
There's a more detailed trace in https://github.com/sagemath/sage/pull/36407#issuecomment-2068709126
Caused by GCC commit https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=049ec9b981d1f4f97736061d5cf7d0ae990b57d7
Upstream report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114872
Can be reproduced by calling any libgap function in exactly 3 parameters, eg. libgap.AbelianGroup(0,0,0)
@sheerluck has kindly disassembled broken element....so
, resulting from compilation of element.c
. I report the result here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114872#c12
(that is, code generation is broken, the call to GAP_CallFunc3Args()
is formed with one parameter less than it should be)
Fixing gcc is above my payrate, though :-)
more discussion on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114872 seems to indicate that gcc is actually not broken, it's our code that is (missing volatile declarations). See also https://stackoverflow.com/questions/7996825/why-volatile-works-for-setjmp-longjmp
If I am not mistaken, we have
sig_on()
...
GAP_CallFunc3Args(foo0,...,foo3)
...
sign_off()
and the longmp()
triggered somewhere in libgap
, called by GAP_CallFunc3Args
, invalidates the registers' contents.
Now, if foo3
, say, was held in a register, it is gone for good. Thus unwinding this call, which involves a GAP object
held in foo3
, is no longer possible, and triggers a segfault.
Looks like a lot of volatile
to add all over the place: (anywhere libgap, singular (which also uses setjmp/longjmp) is called, any other setjmp/longjmp-ing upstream code?
See https://trofi.github.io/posts/312-the-sagemath-saga.html for a detailed analysis. Not sure whether the detail that GAP uses setjmp/longjmp to return from GAP_CallFunc3Args()
here in the 1st place.
By the way, the Python 3.12 is only needed to produce a crash. Otherwise, we get a memory leak, as these temporary GAP objects are not cleared.
$ ./sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 10.4.beta6, Release Date: 2024-05-12 │
│ Using Python 3.11.9. Type "help()" for help. │
└────────────────────────────────────────────────────────────────────┘
sage: libgap.count_GAP_objects()
0
sage: libgap.Sum(1,2,42)
---------------------------------------------------------------------------
GAPError Traceback (most recent call last)
Cell In[2], line 1
----> 1 libgap.Sum(Integer(1),Integer(2),Integer(42))
File /home/scratch/scratch2/dimpase/sage/sage/src/sage/libs/gap/element.pyx:2514, in sage.libs.gap.element.GapElement_Function.__call__()
2512 try:
2513 sig_GAP_Enter()
-> 2514 sig_on()
2515 if n == 0:
2516 result = GAP_CallFunc0Args(self.value)
GAPError: Error, no method found! Error, no 1st choice method found for `SumOp' on 3 arguments
sage: libgap.count_GAP_objects()
4
sage: libgap.Sum(3,5,7)
---------------------------------------------------------------------------
GAPError Traceback (most recent call last)
Cell In[4], line 1
----> 1 libgap.Sum(Integer(3),Integer(5),Integer(7))
File /home/scratch/scratch2/dimpase/sage/sage/src/sage/libs/gap/element.pyx:2514, in sage.libs.gap.element.GapElement_Function.__call__()
2512 try:
2513 sig_GAP_Enter()
-> 2514 sig_on()
2515 if n == 0:
2516 result = GAP_CallFunc0Args(self.value)
GAPError: Error, no method found! Error, no 1st choice method found for `SumOp' on 3 arguments
sage: libgap.count_GAP_objects()
7
I'm afraid this is still happening in 10.4.beta7, which is supposed to include 72e6b66b1c9699cef63922c988c40031a5fc5925
we should do rigorous fixing, not just an ad hoc like this.
We need a stateless way to create GAP objects, which can be automatically destroyed.
I'm afraid this is still happening in 10.4.beta7, which is supposed to include 72e6b66
I'm also seeing this problem on NixOS CI. With the patch in #37951 applied, the temporary variable used to hold <GapElement>a[1]
seems to suffer from the same problem that affected the <GapElement>a[2]
temporary: GCC removes a null check, and at the same time proves the variable is always null, and the end result is that it dereferences a null pointer. This feels like a big minefield. I think I will just apply the following patch for the time being.
diff --git a/src/sage/libs/gap/element.pyx b/src/sage/libs/gap/element.pyx
index b2b0681c052..b2e5807392c 100644
--- a/src/sage/libs/gap/element.pyx
+++ b/src/sage/libs/gap/element.pyx
@@ -35,6 +35,13 @@ from sage.structure.coerce cimport coercion_model as cm
### helper functions to construct lists and records ########################
############################################################################
+
+cdef extern from *:
+ """
+ #pragma GCC optimize("O1")
+ """
+
+
cdef Obj make_gap_list(sage_list) except NULL:
"""
Convert Sage lists into Gap lists
This is back:
$ sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 10.5.beta6, Release Date: 2024-09-29 │
│ Using Python 3.12.6. Type "help()" for help. │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: libgap.AbelianGroup(0,0,0)
...
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
Segmentation fault
I am getting this:
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 10.5.beta6, Release Date: 2024-09-29 │
│ Using Python 3.12.6. Type "help()" for help. │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: libgap.AbelianGroup(0,0,0)
---------------------------------------------------------------------------
GAPError Traceback (most recent call last)
Cell In[1], line 1
----> 1 libgap.AbelianGroup(Integer(0),Integer(0),Integer(0))
File /usr/local/sagedev/src/sage/libs/gap/element.pyx:2520, in sage.libs.gap.element.GapElement_Function.__call__()
2518 try:
2519 sig_GAP_Enter()
-> 2520 sig_on()
2521 if n == 0:
2522 result = GAP_CallFunc0Args(self.value)
The same output in Sage 10.4
Perhaps more volatile
s have to be added, like it was done there:
72e6b66b1c96 (Dima Pasechnik 2024-05-06 23:53:47 +0100 2488) cdef volatile Obj v2
Compilers get more efficient in optimising register allocation, killing our stateful interfaces with GAP and probably more...
I'm sure it is still system-dependent since it was originally "triggered" by a minor GCC update, but I've got it on two very different machines with both gcc-13 and gcc-14 now.
I did try adding more volatile
s for the other arguments to the GAP function calls, but it doesn't seem to help.
What GAP versions do you test with?
What GAP versions do you test with?
Usually with 4.13.1 from Gentoo, but this is happening with the 4.12.2 SPKG as well.
OK, so I have those two machines one is "sage-on-gentoo" release (currently 10.4) and the other is "sage-on-gentoo" cutting edge (Volker's merging branch). I observe the segfault in the release one and @enriqueartal's output in the cutting edge one.
Of interest to me is the explicit coercion of 0
to Integer(0)
in the second case which is stuff that I have seen appear after incorporating the elimination of sage's mpmath backend in #38113 which is currently applied in "sage-on-gentoo cutting edge".
I see @enriqueartal 's output with 10.5.beta6 and Sage's GAP.
Same output if I explicitly use libgap(0)
rather than 0
in the call.
I get a coredump in Linux x86_64 conda-installed, with Python 3.12, Sage 10.4.
Testing https://github.com/sagemath/sage/pull/38804 with the new gap-4.13.1 SPKG:
sage -t --long --warn-long 77.9 --random-seed=14617378659005491983447022349844162666\
9 src/sage/libs/gap/element.pyx
**********************************************************************
File "src/sage/libs/gap/element.pyx", line 2440, in sage.libs.gap.element.GapElement\
_Function.__call__
Failed example:
sorted(a(b))
Expected:
[Group(()),
Sym( [ 1 .. 4 ] ),
Alt( [ 1 .. 4 ] ),
Group([ (1,4)(2,3), (1,2)(3,4) ])]
Got:
[Group(()),
Sym( [ 1 .. 4 ] ),
Alt( [ 1 .. 4 ] ),
Group([ (1,4)(2,3), (1,3)(2,4) ])]
Killed due to segmentation fault
**********************************************************************
...which of course comes right after
sage: for i in range(100):
rnd = [ randint(-10,10) for i in range(randint(0,7)) ]
# compute the sum in GAP
_ = libgap.Sum(rnd)
try:
libgap.Sum(*rnd)
print('This should have triggered a ValueError')
print('because Sum needs a list as argument')
except ValueError:
pass ## line 2494 ##
Concerning the error in
src/sage/libs/gap/element.pyx
, it happens in line 2489:getting
Killed due to segmentation fault
. Running this code in the terminal works, the error appears only in tests and with python 3.12.Originally posted by @enriqueartal in https://github.com/sagemath/sage/issues/37011#issuecomment-1880050744