sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.43k stars 479 forks source link

Runaway/Segfaulting ECL processes #14426

Closed jdemeyer closed 11 years ago

jdemeyer commented 11 years ago

On some systems, when executing

./sage -tp --long devel/sage/sage/interfaces/lisp.py

there are two ECL processes which do (strace log)

read(0, "(setq sage0 2)\n", 1024)       = 15
write(1, "\n", 1)                       = 1
write(1, "2", 1)                        = 1
write(1, "\n", 1)                       = 1
write(1, ">", 1)                        = 1
write(1, " ", 1)                        = 1
read(0, 0x7f2c263b1000, 1024)           = -1 EIO (Input/output error)
--- SIGHUP (Hangup) @ 0 (0) ---
--- SIGCONT (Continued) @ 0 (0) ---
select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
read(0, "", 1024)                       = 0
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
write(2, "\n", 1)                       = -1 EIO (Input/output error)
[...]

after which they either segfault or keep running forever.

A different way to see this problem:

jdemeyer@boxen:/release/merger/sage-5.9.beta2$ ./sage --sh -c 'echo syntax error |ecl 2>/dev/full'
ECL (Embeddable Common-Lisp) 12.12.1 (git:UNKNOWN)
Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya
Copyright (C) 1993 Giuseppe Attardi
Copyright (C) 2000 Juan J. Garcia-Ripoll
ECL is free software, and you are welcome to redistribute it
under certain conditions; see file 'Copyright' for details.
Type :h for Help.  
Top level.
> /bin/bash: line 1: 11264 Done                    echo syntax error
     11265 Segmentation fault      | ecl 2> /dev/full

upstream bug: https://gitlab.com/embeddable-common-lisp/ecl/issues/43

spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)

apply: attachment: 14426_doctest.patch

ecl-12.12.1.p2 (Jeroen Demeyer, 9 April 2013)

Upstream: Reported upstream. Developers acknowledge bug.

CC: @nexttime @jpflori

Component: packages: standard

Author: Jeroen Demeyer

Reviewer: Volker Braun, John Cremona

Merged: sage-5.9.rc0

Issue created by migration from https://trac.sagemath.org/ticket/14426

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -1 +1,27 @@
-Under some circumstances, after doctesting, there remain ECL processes consuming more and more memory.
+On some systems, when executing
+
+```
+./sage -tp --long devel/sage/sage/interfaces/lisp.py
+```
+there are two ECL processes which do (strace log)
+
+```
+read(0, "(setq sage0 2)\n", 1024)       = 15
+write(1, "\n", 1)                       = 1
+write(1, "2", 1)                        = 1
+write(1, "\n", 1)                       = 1
+write(1, ">", 1)                        = 1
+write(1, " ", 1)                        = 1
+read(0, 0x7f2c263b1000, 1024)           = -1 EIO (Input/output error)
+--- SIGHUP (Hangup) @ 0 (0) ---
+--- SIGCONT (Continued) @ 0 (0) ---
+select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
+select(1, [0], NULL, NULL, {0, 0})      = 1 (in [0], left {0, 0})
+read(0, "", 1024)                       = 0
+write(2, "\n", 1)                       = -1 EIO (Input/output error)
+write(2, "\n", 1)                       = -1 EIO (Input/output error)
+write(2, "\n", 1)                       = -1 EIO (Input/output error)
+write(2, "\n", 1)                       = -1 EIO (Input/output error)
+[...]
+```
+after which they either segfault or keep running forever.
jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -25,3 +25,9 @@
 [...]

after which they either segfault or keep running forever. + +A different way to see this problem: + + +./sage --sh -c 'ecl < <(echo x) 2>/dev/full' +

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -29,5 +29,14 @@
 A different way to see this problem:

-./sage --sh -c 'ecl < <(echo x) 2>/dev/full' +jdemeyer@boxen:/release/merger/sage-5.9.beta4$ ./sage --sh -c 'ecl < <(echo x) 2>/dev/full' +ECL (Embeddable Common-Lisp) 12.12.1 (git:UNKNOWN) +Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya +Copyright (C) 1993 Giuseppe Attardi +Copyright (C) 2000 Juan J. Garcia-Ripoll +ECL is free software, and you are welcome to redistribute it +under certain conditions; see file 'Copyright' for details. +Type :h for Help.
+Top level. +> /bin/bash: line 1: 17727 Segmentation fault ecl < <(echo x) 2> /dev/full

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -29,7 +29,7 @@
 A different way to see this problem:

-jdemeyer@boxen:/release/merger/sage-5.9.beta4$ ./sage --sh -c 'ecl < <(echo x) 2>/dev/full' +jdemeyer@boxen:/release/merger/sage-5.9.beta2$ ./sage --sh -c 'echo syntax error |ecl 2>/dev/full' ECL (Embeddable Common-Lisp) 12.12.1 (git:UNKNOWN) Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya Copyright (C) 1993 Giuseppe Attardi @@ -38,5 +38,6 @@ under certain conditions; see file 'Copyright' for details. Type :h for Help.
Top level. -> /bin/bash: line 1: 17727 Segmentation fault ecl < <(echo x) 2> /dev/full +> /bin/bash: line 1: 11264 Done echo syntax error

jdemeyer commented 11 years ago

Author: Jeroen Demeyer

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -41,3 +41,5 @@
 > /bin/bash: line 1: 11264 Done                    echo syntax error
      11265 Segmentation fault      | ecl 2> /dev/full

+ +spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -42,4 +42,6 @@
      11265 Segmentation fault      | ecl 2> /dev/full

+upstream: https://sourceforge.net/p/ecls/bugs/257/ + spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)

jdemeyer commented 11 years ago

Upstream: Reported upstream. No feedback yet.

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -45,3 +45,5 @@
 **upstream**: [https://sourceforge.net/p/ecls/bugs/257/](https://sourceforge.net/p/ecls/bugs/257/)

 **spkg**: [http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg](http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg) ([diff](https://github.com/sagemath/sage-prod/files/10657566/ecl-12.12.1.p2.diff.gz))
+
+**apply**: [attachment: 14426_doctest.patch](https://github.com/sagemath/sage-prod/files/10657565/14426_doctest.patch.gz)
JohnCremona commented 11 years ago
comment:8

I am testing this now, on a machine which showed the problem up to now. I expect it to work since it's the machine on which Jeroen diagnosed the problem, so anyone else who saw the problem should test it too.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 11 years ago
comment:10

Isn't the problem also that PExpect interfaces apparently do not properly get shut down?

The bug / patch should be (better) documented in the spkg; AFAICS there's not even a link to the upstream report there.

novoselt commented 11 years ago
comment:11

Fixes the problem for me!

JohnCremona commented 11 years ago
comment:12

I installed the spkg and patch and now almost no file can be doctested successfully. For example

      File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module>
        ecl_eval("(require 'maxima)")
      File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102)
      File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039)
      File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901)
    RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.
83660e46-0051-498b-a8c1-f7a7bd232b5a commented 11 years ago
comment:13

Replying to @JohnCremona:

I installed the spkg and patch and now almost no file can be doctested successfully. For example

      File "/home/jec/sage-5.9.beta4/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 80, in <module>
        ecl_eval("(require 'maxima)")
      File "ecl.pyx", line 1225, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7102)
      File "ecl.pyx", line 1240, in sage.libs.ecl.ecl_eval (sage/libs/ecl.c:7039)
      File "ecl.pyx", line 246, in sage.libs.ecl.ecl_safe_eval (sage/libs/ecl.c:2901)
    RuntimeError: ECL says: Module error: Don't know how to REQUIRE MAXIMA.

You of course have to rebuild the spkgs that depend on ECL as well, i.e., Maxima, and do sage -b afterwards.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 11 years ago
comment:14

FWIW, I think we met that "double-fault" problem with stderr before, quite a while ago, and IIRC discussed it with upstream, so it's a bit astonishing it's still in. (Although the circumstances were probably slightly different.)

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 11 years ago
comment:15

SPKG.txt lacks a "Patches" section, and the following "Special Update/Build Instructions" should get corrected:

 * Note: the way we configure Sage, CXX and CXXFLAGS are unused.
 * Note: for the time being, ECL is built single threaded library as it
   seems to interact badly with the pexpect interface and Sage's signal
   handling when built multithreaded.

(Related to the first, printing the settings of CXX and CXXFLAGS in spkg-install then makes no sense.)

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 11 years ago
comment:16

As expected, for me solves the issues with ECL and (Ubuntu's) GNU Make 3.81 and the new doctesting framework on Ubuntu 10.04.4 LTS x86_64. (Haven't tested on x86 yet, but I assume it will fix the specific ECL issue there as well.)

Still, a working cleaner should have properly killed the processes, and it's not obvious what actually caused ECL running amok (i.e., why writing to stderr fails in the first place).

jdemeyer commented 11 years ago
comment:17

I made some further small changes to the spkg-install file.

jdemeyer commented 11 years ago

Description changed:

--- 
+++ 
@@ -47,3 +47,13 @@
 **spkg**: [http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg](http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg) ([diff](https://github.com/sagemath/sage-prod/files/10657566/ecl-12.12.1.p2.diff.gz))

 **apply**: [attachment: 14426_doctest.patch](https://github.com/sagemath/sage-prod/files/10657565/14426_doctest.patch.gz)
+
+### ecl-12.12.1.p2 (Jeroen Demeyer, 9 April 2013)
+* #14426: write_error.patch: avoid an infinite loop when reporting
+  an error while writing to stderr.
+* Rename spkg-make to spkg-src.
+* Don't unset MAKEFLAGS (it was not clear why this was needed).
+* It seems no longer needed to disable Altivec.
+* Support ECL_CONFIGURE environment variable for options to
+  ./configure.
+
JohnCremona commented 11 years ago
comment:18

OK, it worked for me after both rebuilding maxima and also the whole Sage library (sage -ba) after applying the patch and new spkg.

vbraun commented 11 years ago

Reviewer: Volker Braun, John Cremona

vbraun commented 11 years ago
comment:19

Looks good to me

jdemeyer commented 11 years ago
comment:20

On various, this causes ECL-related doctest failures. I have no idea why...

jdemeyer commented 11 years ago
comment:21

Also: /dev/full doesn't exist on all systems.

jdemeyer commented 11 years ago

Attachment: 14426_doctest.patch.gz

Attachment: ecl-12.12.1.p2.diff.gz

jdemeyer commented 11 years ago
comment:23

In particular, the doctest

sage: var('a,b,c') ## line 416 ##
(a, b, c)
sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ##
sage: s = solve(eqn, a,b,c); s ## line 419 ##

in devel/sage/doc/en/constructions/linear_algebra.rst seems problematic for ECL.

vbraun commented 11 years ago
comment:24

I guess /dev/full is linux only.

I don't get any doctest failures from linear_algebra.rst, for the record.

jdemeyer commented 11 years ago
comment:25

Replying to @vbraun:

I don't get any doctest failures from linear_algebra.rst, for the record.

Well, the error isn't reproducible. When it fails, it usually fails like

sage: var('a,b,c') ## line 416 ##
(a, b, c)
sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ##
sage: s = solve(eqn, a,b,c); s ## line 419 ##

;;; Unhandled lisp initialization error
;;; Message:
UNBOUND-VARIABLE
;;; Arguments:

Internal or unrecoverable error in:

Lisp initialization error.

  [2: No such file or directory]

;;; ECL C Backtrace
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_dump_c_backtrace+0x28) [0x7f6a15678208]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_internal_error+0x3f) [0x7f6a156631df]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x124324) [0x7f6a15663324]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x1254b2) [0x7f6a156644b2]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_argument+0x1e) [0x7f6a156644de]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(stream_dispatch_table+0x17) [0x7f6a15656e47]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_write_char+0x1b) [0x7f6a156576db]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x13769b) [0x7f6a1567669b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_write_symbol+0x156) [0x7f6a15676bf6]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_write_ugly_object+0x26) [0x7f6a15675cf6]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x12430b) [0x7f6a1566330b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f6a15646410]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f6a1566416b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x125308) [0x7f6a15664308]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(ecl_interpret+0x19cd) [0x7f6a1564869d]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x10e36f) [0x7f6a1564d36f]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_eval_with_env+0x2eb) [0x7f6a1564ef2b]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(si_signal_simple_error+0x26d) [0x7f6a15613e6d]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(FEwrong_type_nth_arg+0x109) [0x7f6a15663d29]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(_ecl_sethash+0) [0x7f6a156901a0]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0x14df58) [0x7f6a1568cf58]
;;; /lib64/libpthread.so.0() [0x36e9e0f500]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xb621e) [0x7f6a155f521e]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbce17) [0x7f6a155fbe17]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd368) [0x7f6a155fc368]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd882) [0x7f6a155fc882]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]
;;; /home/buildbot/build/sage/eno-1/eno_full/build/sage-5.9.rc0/local/lib/libecl.so.12.12(+0xbd8bd) [0x7f6a155fc8bd]

**********************************************************************
----------------------------------------------------------------------
sage -t --long devel/sage/doc/en/constructions/linear_algebra.rst  # Killed due to abort
----------------------------------------------------------------------
jdemeyer commented 11 years ago
comment:27

New version of the patch seems to work fine.

jpflori commented 11 years ago
comment:28

I guess you mean the version where you check /dev/full exists?

jdemeyer commented 11 years ago
comment:29

Replying to @jpflori:

I guess you mean the version where you check /dev/full exists?

And the new version of patches/write_error.patch inside the ECL spkg.

jpflori commented 11 years ago
comment:30

Could you post the old version so that I can spot the differences?

jdemeyer commented 11 years ago
comment:31

Replying to @jpflori:

Could you post the old version so that I can spot the differences?

I don't have the old version anymore. But that doesn't matter, could you perhaps review it as if there never was a previous version?

jpflori commented 11 years ago
comment:32

As the patch is quite simple, I was wondering what was failing before and caused the random failures, but of course I can pretend this previous version did not exist.

jdemeyer commented 11 years ago
comment:33

The previous version patched restartable_io_error() but that was called from different places, possibly causing the problems.

vbraun commented 11 years ago
comment:34

Looks good to me.

jdemeyer commented 11 years ago

Merged: sage-5.9.rc0

6bfcfeed-779b-4b16-82e6-63808dde0af0 commented 11 years ago
comment:36

Was the patch forwarded to upstream?

jdemeyer commented 11 years ago
comment:37

Replying to @SnarkBoojum:

Was the patch forwarded to upstream?

Yes.

6bfcfeed-779b-4b16-82e6-63808dde0af0 commented 11 years ago
comment:38

Replying to @jdemeyer:

Replying to @SnarkBoojum:

Was the patch forwarded to upstream?

Yes.

Perfect!

jhpalmieri commented 11 years ago
comment:39

Replying to @jdemeyer:

Replying to @vbraun:

I don't get any doctest failures from linear_algebra.rst, for the record.

Well, the error isn't reproducible. When it fails, it usually fails like

Are you sure this is related to this ticket? I only see this after applying the patches at #14055, and I see this whether I have applied the patches here or not. This is happening on both mark and taurus. (I also mentioned it on #14055.)

jdemeyer commented 11 years ago
comment:40

Replying to @SnarkBoojum:

Was the patch forwarded to upstream?

Yes, but upstream is totally ignoring it...

dimpase commented 9 years ago
comment:41

Replying to @jdemeyer:

Replying to @SnarkBoojum:

Was the patch forwarded to upstream?

Yes, but upstream is totally ignoring it...

here is another try. Upstream points out, correctly, that the patch does not work if ECL is configured without disabling threads.

https://gitlab.com/embeddable-common-lisp/ecl/merge_requests/1 and https://gitlab.com/embeddable-common-lisp/ecl/issues/43

dimpase commented 9 years ago

Changed upstream from Reported upstream. No feedback yet. to Reported upstream. Developers acknowledge bug.

jdemeyer commented 9 years ago

Description changed:

--- 
+++ 
@@ -42,7 +42,7 @@
      11265 Segmentation fault      | ecl 2> /dev/full

-upstream: https://sourceforge.net/p/ecls/bugs/257/ +upstream bug: http://sourceforge.net/p/ecls/bugs/303/

spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)

jdemeyer commented 9 years ago

Description changed:

--- 
+++ 
@@ -42,7 +42,7 @@
      11265 Segmentation fault      | ecl 2> /dev/full

-upstream bug: http://sourceforge.net/p/ecls/bugs/303/ +upstream bug: https://gitlab.com/embeddable-common-lisp/ecl/issues/43

spkg: http://boxen.math.washington.edu/home/jdemeyer/spkg/ecl-12.12.1.p2.spkg (diff)