sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.45k stars 482 forks source link

GIAC kills process 1 in ctrl_c_signal_handler #33706

Closed mkoeppe closed 2 years ago

mkoeppe commented 2 years ago

On ubuntu-jammy-standard (https://github.com/sagemath/sage/runs/5962279659)

sage -t --random-seed=156404901056981760924144629149815074678 src/sage/tests/cmdline.py
    [216 tests, 73.54 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 3739.0 seconds
    cpu time: 6427.0 seconds
    cumulative wall time: 10155.6 seconds
Features detected for doctesting: gfan,nauty,palp,sage.combinat,sage.geometry.polyhedron,sage.graphs,sage.groups,sage.plot,sage.rings.number_field,sage.rings.padics,sage.rings.real_double,sage.symbolic,sagemath_doc_html,sphinx
============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-7.1.1, pluggy-1.0.0
rootdir: /sage/src, configfile: tox.ini
collected 26 items / 104 skipped

src/sage/manifolds/differentiable/symplectic_form_test.py .............. [ 53%]
........                                                                 [ 84%]
src/sage/manifolds/differentiable/examples/symplectic_space_test.py .... [100%]

================= 26 passed, 104 skipped in 102.79s (0:01:42) ==================
The command '/bin/sh -c make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ${TARGETS_OPTIONAL} || echo "(error ignored)"' returned a non-zero code: 130

Likewise on gitpod-standard, debian-bookworm-standard, linuxmint-20.1-standard.

On linuxmint-19-standard (https://github.com/sagemath/sage/runs/5962280173?check_suite_focus=true), the exit code is 2.

CC: @tobiasdiez @orlitzky @sagetrac-parisse @tornaria @vbraun @dimpase

Component: doctest framework

Author: Matthias Koeppe

Branch: ecd3895

Reviewer: Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/33706

mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-https://github.com/sagemath/sage/runs/5962279659?check_suite_focus=true
+On `ubuntu-jammy-standard` (https://github.com/sagemath/sage/runs/5962279659)

sage -t --random-seed=156404901056981760924144629149815074678 src/sage/tests/cmdline.py @@ -23,3 +23,9 @@ The command '/bin/sh -c make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ${TARGETS_OPTIONAL} || echo "(error ignored)"' returned a non-zero code: 130


+Likewise on `gitpod-standard`, `debian-bookworm-standard`, `linuxmint-20.1-standard`.
+
+
+On `linuxmint-19-standard` (https://github.com/sagemath/sage/runs/5962280173?check_suite_focus=true), the exit code is 2. 
+
+
mkoeppe commented 2 years ago
comment:2

The exit code 130 from the shell is rather mysterious. Not sure if it really has anything to do with pytest.

It appears to be non-deterministic. I have reproduced it in the container (docker run -it docker.pkg.github.com/sagemath/sage/sage-docker-gitpod-standard-with-targets-optional:9.6.rc0-failed bash) after adding some print calls to src/bin/sage-runtests to see what the return values of the Sage doctester and pytest are.

gitpod ~/sage $ /bin/sh -c 'make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ptest-nodoc || echo "(error ignored)"'
----------------------------------------------------------------------
sage -t --warn-long 102.2 --random-seed=17067963460848597109909214796612832081 src/sage/manifolds/differentiable/tensorfield.py  # Timed out
sage -t --warn-long 102.2 --random-seed=17067963460848597109909214796612832081 src/sage/interfaces/expect.py  # 2 doctests failed
----------------------------------------------------------------------
Total time for all tests: 1677.6 seconds
    cpu time: 13605.8 seconds
    cumulative wall time: 24340.6 seconds
Features detected for doctesting: gfan,imagemagick,nauty,palp,sage.combinat,sage.geometry.polyhedron,sage.graphs,sage.groups,sage.plot,sage.rings.number_field,sage.rings.padics,sage.rings.real_double,sage.symbolic,sagemath_doc_html,sphinx
ERR=5
============================================================ test session starts =============================================================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/gitpod/sage/src, configfile: tox.ini
collected 26 items / 104 skipped                                                                                                             

src/sage/manifolds/differentiable/symplectic_form_test.py ......................                                                       [ 84%]
src/sage/manifolds/differentiable/examples/symplectic_space_test.py ....                                                               [100%]

================================================ 26 passed, 104 skipped in 104.61s (0:01:44) =================================================
exit_code_pytest=0
make: *** [Makefile:287: ptest-nodoc] Error 5
(error ignored)

gitpod ~/sage $ echo $?
130
gitpod ~/sage $ ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Jul 18  2019 /bin/sh -> dash

I think it may be specific to the shell that is used on the systems where the failure was observed.

mkoeppe commented 2 years ago
comment:3

The problem can be reproduced about 1 in 5 times using /bin/bash -c './sage -t src/sage/calculus/calculus.py'.

A giac process sends SIGINT to the whole process group, sometimes succeeding to take down the calling bash, which then gives exit code 130.

mkoeppe commented 2 years ago
comment:4

This might be breakage from #8784

mkoeppe commented 2 years ago
comment:5

An easy way to reproduce: while true; do /bin/bash -c './sage -t src/sage/calculus/calculus.py'; done -- this infinite loop ends in a finite number of iterations

mkoeppe commented 2 years ago
comment:6

With binary search I have obtained the following simple reproducer:

while true; do ./sage -c "k, n = var('k,n'); from sage.calculus.calculus import symbolic_sum; print(symbolic_sum(1/(1+k^2), k, -oo, oo, algorithm = 'giac')); print(gp.eval('intnum(x=17,42,exp(-x^2)*log(x))'))"; done
mkoeppe commented 2 years ago
comment:7

Even simpler:

while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
mkoeppe commented 2 years ago
comment:8

gp can also be replaced with maxima or singular with same results.

Some stracing:

$ while true; do rm -f STRAC*; strace -ff -o STRACE ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
1
2

gitpod ~/sage $ echo $?
130
$ grep kill STRA*
...
STRACE.99696:kill(1, SIGINT)                         = 0
...
STRACE.99732:kill(-99700, SIGCONT)                   = 0
STRACE.99732:kill(-99700, SIGINT)                    = -1 ESRCH (No such process)
STRACE.99733:kill(-99696, SIGCONT)                   = 0
STRACE.99733:kill(-99696, SIGINT)                    = 0
STRACE.99733:kill(-99696, SIGHUP)                    = -1 ESRCH (No such process)

99696 is the giac process.

mkoeppe commented 2 years ago
comment:9

@sagetrac-parisse This is happening in https://github.com/geogebra/giac/blob/c2058a0c8921af8a762f6fbede1354b974bf5a70/src/giac/cpp/global.cc#L3761 (although we are still on GIAC 1.6). Somehow child_id is 1 and it ends up killing the whole process group with SIGINT.

sheerluck commented 2 years ago
comment:11

Replying to @mkoeppe:

Even simpler:

while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done

for me that loop never ends when giac is 1.7.0.47

mkoeppe commented 2 years ago
comment:12

I'll try with the upgrade ticket #31563 on this platform. Unfortunately the upgrade is stuck

tornaria commented 2 years ago
comment:13

Replying to @mkoeppe:

Even simpler:

while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done

No failure here, tested with giac 1.7.0-53 or 1.9.0-5.

mkoeppe commented 2 years ago

Branch: u/mkoeppe/giac_kills_process_1_in_ctrl_c_signal_handler

mkoeppe commented 2 years ago

New commits:

ecd3895build/pkgs/giac/patches/0001-src-global.cc-Do-not-send-SIGINT-to-process-1.patch: New
mkoeppe commented 2 years ago

Commit: ecd3895

mkoeppe commented 2 years ago

Author: Matthias Koeppe

mkoeppe commented 2 years ago
comment:16

We won't be able to do the upgrade for Sage 9.6 because Cygwin support is unresolved. So here is a hotfix.

vbraun commented 2 years ago

Reviewer: Volker Braun

mkoeppe commented 2 years ago
comment:19

Thanks!

vbraun commented 2 years ago

Changed branch from u/mkoeppe/giac_kills_process_1_in_ctrl_c_signal_handler to ecd3895

mkoeppe commented 2 years ago

Changed commit from ecd3895 to none

mkoeppe commented 2 years ago
comment:21

This (unsurprisingly) still happens in 9.6.rc3 on systems where the system giac is used: ubuntu-jammy-standard (https://github.com/sagemath/sage/runs/6236167662, 1.7.0.39+dfsg2-1build2) and debian-sid-standard.

mkoeppe commented 2 years ago
comment:22

Follow-up = #33848