Closed mkoeppe closed 2 years ago
Description changed:
---
+++
@@ -1,4 +1,4 @@
-https://github.com/sagemath/sage/runs/5962279659?check_suite_focus=true
+On `ubuntu-jammy-standard` (https://github.com/sagemath/sage/runs/5962279659)
sage -t --random-seed=156404901056981760924144629149815074678 src/sage/tests/cmdline.py @@ -23,3 +23,9 @@ The command '/bin/sh -c make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ${TARGETS_OPTIONAL} || echo "(error ignored)"' returned a non-zero code: 130
+Likewise on `gitpod-standard`, `debian-bookworm-standard`, `linuxmint-20.1-standard`.
+
+
+On `linuxmint-19-standard` (https://github.com/sagemath/sage/runs/5962280173?check_suite_focus=true), the exit code is 2.
+
+
The exit code 130 from the shell is rather mysterious. Not sure if it really has anything to do with pytest.
It appears to be non-deterministic. I have reproduced it in the container (docker run -it docker.pkg.github.com/sagemath/sage/sage-docker-gitpod-standard-with-targets-optional:9.6.rc0-failed bash
) after adding some print
calls to src/bin/sage-runtests
to see what the return values of the Sage doctester and pytest are.
gitpod ~/sage $ /bin/sh -c 'make SAGE_SPKG="sage-spkg -y -o" ${USE_MAKEFLAGS} ptest-nodoc || echo "(error ignored)"'
----------------------------------------------------------------------
sage -t --warn-long 102.2 --random-seed=17067963460848597109909214796612832081 src/sage/manifolds/differentiable/tensorfield.py # Timed out
sage -t --warn-long 102.2 --random-seed=17067963460848597109909214796612832081 src/sage/interfaces/expect.py # 2 doctests failed
----------------------------------------------------------------------
Total time for all tests: 1677.6 seconds
cpu time: 13605.8 seconds
cumulative wall time: 24340.6 seconds
Features detected for doctesting: gfan,imagemagick,nauty,palp,sage.combinat,sage.geometry.polyhedron,sage.graphs,sage.groups,sage.plot,sage.rings.number_field,sage.rings.padics,sage.rings.real_double,sage.symbolic,sagemath_doc_html,sphinx
ERR=5
============================================================ test session starts =============================================================
platform linux -- Python 3.8.10, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/gitpod/sage/src, configfile: tox.ini
collected 26 items / 104 skipped
src/sage/manifolds/differentiable/symplectic_form_test.py ...................... [ 84%]
src/sage/manifolds/differentiable/examples/symplectic_space_test.py .... [100%]
================================================ 26 passed, 104 skipped in 104.61s (0:01:44) =================================================
exit_code_pytest=0
make: *** [Makefile:287: ptest-nodoc] Error 5
(error ignored)
gitpod ~/sage $ echo $?
130
gitpod ~/sage $ ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Jul 18 2019 /bin/sh -> dash
I think it may be specific to the shell that is used on the systems where the failure was observed.
The problem can be reproduced about 1 in 5 times using /bin/bash -c './sage -t src/sage/calculus/calculus.py'
.
A giac
process sends SIGINT to the whole process group, sometimes succeeding to take down the calling bash
, which then gives exit code 130.
This might be breakage from #8784
An easy way to reproduce: while true; do /bin/bash -c './sage -t src/sage/calculus/calculus.py'; done
-- this infinite loop ends in a finite number of iterations
With binary search I have obtained the following simple reproducer:
while true; do ./sage -c "k, n = var('k,n'); from sage.calculus.calculus import symbolic_sum; print(symbolic_sum(1/(1+k^2), k, -oo, oo, algorithm = 'giac')); print(gp.eval('intnum(x=17,42,exp(-x^2)*log(x))'))"; done
Even simpler:
while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
gp
can also be replaced with maxima
or singular
with same results.
Some stracing:
$ while true; do rm -f STRAC*; strace -ff -o STRACE ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
1
2
gitpod ~/sage $ echo $?
130
$ grep kill STRA*
...
STRACE.99696:kill(1, SIGINT) = 0
...
STRACE.99732:kill(-99700, SIGCONT) = 0
STRACE.99732:kill(-99700, SIGINT) = -1 ESRCH (No such process)
STRACE.99733:kill(-99696, SIGCONT) = 0
STRACE.99733:kill(-99696, SIGINT) = 0
STRACE.99733:kill(-99696, SIGHUP) = -1 ESRCH (No such process)
99696 is the giac
process.
@sagetrac-parisse
This is happening in https://github.com/geogebra/giac/blob/c2058a0c8921af8a762f6fbede1354b974bf5a70/src/giac/cpp/global.cc#L3761 (although we are still on GIAC 1.6). Somehow child_id
is 1 and it ends up killing the whole process group with SIGINT.
Replying to @mkoeppe:
Even simpler:
while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
for me that loop never ends when giac is 1.7.0.47
I'll try with the upgrade ticket #31563 on this platform. Unfortunately the upgrade is stuck
Replying to @mkoeppe:
Even simpler:
while true; do ./sage -c "print(giac.eval('1')); print(gp.eval('2'))"; done
No failure here, tested with giac 1.7.0-53 or 1.9.0-5.
New commits:
ecd3895 | build/pkgs/giac/patches/0001-src-global.cc-Do-not-send-SIGINT-to-process-1.patch: New |
Author: Matthias Koeppe
We won't be able to do the upgrade for Sage 9.6 because Cygwin support is unresolved. So here is a hotfix.
Reviewer: Volker Braun
Thanks!
Changed branch from u/mkoeppe/giac_kills_process_1_in_ctrl_c_signal_handler to ecd3895
This (unsurprisingly) still happens in 9.6.rc3 on systems where the system giac is used: ubuntu-jammy-standard
(https://github.com/sagemath/sage/runs/6236167662, 1.7.0.39+dfsg2-1build2) and debian-sid-standard
.
Follow-up = #33848
On
ubuntu-jammy-standard
(https://github.com/sagemath/sage/runs/5962279659)Likewise on
gitpod-standard
,debian-bookworm-standard
,linuxmint-20.1-standard
.On
linuxmint-19-standard
(https://github.com/sagemath/sage/runs/5962280173?check_suite_focus=true), the exit code is 2.CC: @tobiasdiez @orlitzky @sagetrac-parisse @tornaria @vbraun @dimpase
Component: doctest framework
Author: Matthias Koeppe
Branch:
ecd3895
Reviewer: Volker Braun
Issue created by migration from https://trac.sagemath.org/ticket/33706