Closed jdemeyer closed 13 years ago
Description changed:
---
+++
@@ -1,9 +1,8 @@
-The following is a non-reproducible problem, but it happened at least twice for me:
+The following is a non-reproducible problem, but it happened at least three times for me:
when building Sage from source using
-export MAKE="make -j8" -$MAKE +make
it can happen that `make` simply hangs forever after the message
@@ -16,5 +15,3 @@
make: *** [doc-html] Interrupt
-
-I'm not sure whether the "-j8" option has anything to do with it.
Changed keywords from none to build Makefile
This might be due to NFS (I know it happened on sage.math.washington.edu but I can't remember whether it ever happened on other machines).
I am getting more and more convinced that this is an NFS issue, so proposal to close this ticket as invalid.
Just because NFS triggers it doesn't necessarily mean that its not a bug in Sage. I've never seen this issue, but then I never use high-latency filesystems. Can you run strace/gdb on the stuck process if you see it again? (use -p to attach to existing pid)
Description changed:
---
+++
@@ -1,5 +1,5 @@
-The following is a non-reproducible problem, but it happened at least three times for me:
-when building Sage from source using
+The following is a non-reproducible problem, but it happened several times.
+When building Sage from source using
make @@ -15,3 +15,4 @@
make: *** [doc-html] Interrupt
+The issue is a stuck sage-cleaner
process.
Should there be some timeout setting in sage-cleaner? If it has to wait too long, exit but print a warning message that there may be some stray processes which should be killed by hand?
No the sage cleaner needs to stick around for as long as Sage is running. There might be a computation running that, after a long while, does a big @
parallel loop and if nobody is cleaning up the filesystem will be very unhappy.
Does sage-cleaner
actually run when you do sage -docbuild ...
(which is called by make doc-html
)? It doesn't look like it to me. It seems that sage -docbuild all html
creates a bunch of temporary directories in DOT_SAGE/temp/HOSTNAME
, and those aren't deleted until the next time I run Sage, when sage-cleaner
actually runs. So maybe the issue on this ticket isn't related to sage-cleaner
.
Description changed:
---
+++
@@ -15,4 +15,28 @@
make: *** [doc-html] Interrupt
-The issue is a stuck `sage-cleaner` process.
+
+Here is a `ps` snapshot of a situation where it happened:
+
+```
+$ ps -u jdemeyer xfo pid,ppid,sess,tty,args
+ PID PPID SESS TT COMMAND
+26136 24879 24879 ? sshd: jdemeyer@pts/25
+26308 26136 26308 pts/25 \_ -bash
+26434 26308 26308 pts/25 \_ screen -r
+30777 1 23411 pts/6 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner
+21614 1 21614 ? SCREEN
+21615 21614 21615 pts/24 \_ /bin/bash
+21623 21615 21615 pts/24 | \_ top
+23411 21614 23411 pts/6 \_ /bin/bash
+ 6956 23411 23411 pts/6 | \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+ 6969 6956 23411 pts/6 | \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+ 6970 6969 23411 pts/6 | | \_ tee /home/release/sage-4.7.2.alpha4/logs/merger.log
+30248 6956 23411 pts/6 | \_ make -j6 doc-html
+30249 30248 23411 pts/6 | \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+30250 30249 23411 pts/6 | | \_ tee /home/release/sage-4.7.2.alpha4/logs/dochtml.log
+30747 30248 23411 pts/6 | \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html 2>&1" "tee -a dochtml.log"
+30748 30747 23411 pts/6 | \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html 2>&1 tee -a dochtml.log
+30750 30748 23411 pts/6 | \_ tee -a dochtml.log
+```
+Note the orphaned `sage-cleaner` process.
Description changed:
---
+++
@@ -19,24 +19,24 @@
Here is a `ps` snapshot of a situation where it happened:
-$ ps -u jdemeyer xfo pid,ppid,sess,tty,args
Note the orphaned `sage-cleaner` process.
Description changed:
---
+++
@@ -21,13 +21,8 @@
$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args PID PPID SESS TT TPGID COMMAND -26136 24879 24879 ? -1 sshd: jdemeyer@pts/25 -26308 26136 26308 pts/25 26434 _ -bash -26434 26308 26308 pts/25 26434 _ screen -r 30777 1 23411 pts/6 6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner 21614 1 21614 ? -1 SCREEN -21615 21614 21615 pts/24 21623 _ /bin/bash -21623 21615 21615 pts/24 21623 | _ top 23411 21614 23411 pts/6 6956 _ /bin/bash 6956 23411 23411 pts/6 6956 | _ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge" 6969 6956 23411 pts/6 6956 | _ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
Can you also add the content of ~/.sage/temp/HOSTNAME/
to the bug report? The sage cleaner lives on until it is empty...
Description changed:
---
+++
@@ -19,7 +19,7 @@
Here is a `ps` snapshot of a situation where it happened:
-$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args +$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args # Irrelevant entries removed PID PPID SESS TT TPGID COMMAND 30777 1 23411 pts/6 6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner 21614 1 21614 ? -1 SCREEN @@ -33,5 +33,14 @@ 30747 30248 23411 pts/6 6956 | _ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html 2>&1" "tee -a dochtml.log" 30748 30747 23411 pts/6 6956 | _ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html 2>&1 tee -a dochtml.log 30750 30748 23411 pts/6 6956 | _ tee -a dochtml.log
Note the orphaned `sage-cleaner` process.
+ +$ ls -F ~/.sage/temp/sage.math.washington.edu/ +19769/ 20243/ 20643/ 21176/ 30653/ 3678/ 4272/ 4335/ 4385/ 4425/ 5168/ 5235/ 5522/ +19942/ 20359/ 20860/ 21288/ 30822/ 3828/ 4291/ 4351/ 4400/ 4445/ 5187/ 5307/ +20207/ 20517/ 20978/ 30444/ 30880/ 3966/ 4320/ 4370/ 4406/ 5149/ 5210/ 5454/ +
+Each of these directories seems to contain only an interface
directory, but there are no files.
Description changed:
---
+++
@@ -35,12 +35,3 @@
30750 30748 23411 pts/6 6956 | \_ tee -a dochtml.log
5454
sage-cleaner
process.- -$ ls -F ~/.sage/temp/sage.math.washington.edu/ -19769/ 20243/ 20643/ 21176/ 30653/ 3678/ 4272/ 4335/ 4385/ 4425/ 5168/ 5235/ 5522/ -19942/ 20359/ 20860/ 21288/ 30822/ 3828/ 4291/ 4351/ 4400/ 4445/ 5187/ 5307/ -20207/ 20517/ 20978/ 30444/ 30880/ 3966/ 4320/ 4370/ 4406/ 5149/ 5210/ 5454/ -
-Each of these directories seems to contain only an interface
directory, but there are no files.
It seems to have pulled itself through after a while. Don't know why...
Replying to @vbraun:
Can you also add the content of
~/.sage/temp/HOSTNAME/
to the bug report? The sage cleaner lives on until it is empty...
What do mean with "empty"? Does this mean that, if I have 10 copies of Sage running, the Sage cleaner will only exit if all those 10 copies of Sage exit?
I think so. Its purpose is to delete the ~/.sage/temp/HOSTNAME/PID
directory once the process with PID
has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really, make doc
should not wait for the sage cleaner process to finish...
Replying to @vbraun:
I think so. Its purpose is to delete the
~/.sage/temp/HOSTNAME/PID
directory once the process withPID
has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really,make doc
should not wait for the sage cleaner process to finish...
This is probably because of file descriptors being open.
I will do some more controlled tests on geom.math.washington.edu
, a machine where I normally don't run Sage.
I still don't know why sage-cleaner is running at all if you're doing make doc-html. That runs sage -docbuild, which runs this (from sage-sage):
if [ "$1" = "-docbuild" -o "$1" = "--docbuild" ]; then
cd "$CUR"
shift
python "$SAGE_ROOT"/devel/sage/doc/common/builder.py $@
exit $?
fi
It doesn't run sage-cleaner or sage_setup, so where does sage-cleaner get started?
Are you running another copy of Sage at the same time?
Replying to @jhpalmieri:
Are you running another copy of Sage at the same time?
Yes, that's why I said I will do more controlled tests. However, the fact that the session ID of the sage-cleaner
is the same as make doc-html
indicates that sage-cleaner
might have been started by the docbuilder. Are you sure that the docbuilder doesn't run Sage?
Description changed:
---
+++
@@ -35,3 +35,15 @@
30750 30748 23411 pts/6 6956 | \_ tee -a dochtml.log
5454
+
+If it matters:
+
+ +$ bash --version +GNU bash, version 4.1.0(1)-release (x86_64-unknown-linux-gnu) +Copyright (C) 2009 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> + +This is free software; you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +
I'm not sure, but I don't see why sage-cleaner would start, and if I run "sage -docbuild html all" or "make doc-html" in one window, ps ux | grep clean
comes up empty while it's running.
Description changed:
---
+++
@@ -1,8 +1,8 @@
The following is a non-reproducible problem, but it happened several times.
-When building Sage from source using
+When building the Sage documentation using
-make +make doc-html
it can happen that `make` simply hangs forever after the message
@@ -33,8 +33,8 @@
30747 30248 23411 pts/6 6956 | \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html 2>&1" "tee -a dochtml.log"
30748 30747 23411 pts/6 6956 | \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html 2>&1 tee -a dochtml.log
30750 30748 23411 pts/6 6956 | \_ tee -a dochtml.log
- 5454
+Note the orphaned sage-cleaner
process with the same session ID as make doc-html
.
If it matters:
Note that it's still only a conjecture that sage-cleaner
has anything to do with it. I'm putting some hooks into sage-cleaner
and see what that gives. Another thing to consider is that maybe it only happens the very first time documentation is built.
sphinx-build
does start sage-cleaner
:
$ ps -u jdemeyer xfo pid,ppid,pgid,sess,tty,tpgid,stat,args
PID PPID PGID SESS TT TPGID STAT COMMAND
2811 1 2262 26064 pts/3 26064 T python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-cleaner
2830 24409 2829 24409 pts/6 2829 S+ | \_ grep -e pts/3 -e COMMAND
26064 6198 26064 26064 pts/3 26064 Ss+ \_ /bin/bash
2262 26064 2262 26064 pts/3 26064 T | \_ make doc-html
2781 2262 2262 26064 pts/3 26064 T | \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html 2>&1" "tee -a dochtml.log"
2782 2781 2262 26064 pts/3 26064 T | \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html 2>&1 tee -a dochtml.log
2783 2782 2262 26064 pts/3 26064 T | \_ bash ./sage -docbuild --no-pdf-links all html
2791 2783 2262 26064 pts/3 26064 T | | \_ bash /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-sage -docbuild --no-pdf-links all html
2806 2791 2262 26064 pts/3 26064 T | | \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/common/builder.py --no-pdf-links all html
2807 2806 2262 26064 pts/3 26064 T | | \_ /bin/sh -c sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
2808 2807 2262 26064 pts/3 26064 T | | \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
2784 2782 2262 26064 pts/3 26064 T | \_ tee -a dochtml.log
Frustrating that it is hard to reproduce: make doc-html
sometimes runs sage-cleaner
.
This is sphinx-build
:
#!/usr/bin/env python
# EASY-INSTALL-ENTRY-SCRIPT: 'Sphinx==1.0.4','console_scripts','sphinx-build'
__requires__ = 'Sphinx==1.0.4'
import sys
from pkg_resources import load_entry_point
import sage.all
sys.exit(
load_entry_point('Sphinx==1.0.4', 'console_scripts', 'sphinx-build')()
)
Note the line "import sage.all". Perhaps this can sometimes (indirectly) run sage-cleaner
?
I don't see how the import sage.all could start sage-cleaner. If you can reproduce the problem, how about you strace the docbuild process? Then it would be easy to find out which child process starts the cleaner.
Replying to @vbraun:
If you can reproduce the problem
That's precisely the problem, it must be some weird race-condition.
I found a way to reproduce it: when $HOME/.sage
does not exist, sage-cleaner
is always started when doing make doc-html
.
It is indeed the line import sage.all
which causes sage-cleaner
to be run.
Attachment: 10163_debug_sage_cleaner.patch.gz
STOP all processes when sage-cleaner is run, do not apply
gap
seems to be involved somehow. The last thing which happens before starting sage-cleaner
is starting gap
.
But why does make hang until the cleaner exits? The sage-cleaner process is already orphaned, so nobody should be waiting for it. Did you strace the hanging make process? I would tend to think that it is a good thing that the sage-cleaner is started.
This reproduces the problem:
$ ( export HOME=/tmp/sagehome; rm -rf $HOME; mkdir -p $HOME/.sage/temp/`hostname`/$$; make doc-html; )
What happens here is that we start from a clean $HOME
and create a bogus directory in $HOME/.sage/temp/$hostname
to make sure sage-cleaner
does not exit. Note that you can delete the last 3 lines from local/bin/sphinx-build
to speed up testing.
Attachment: 10163_sage_cleaner.patch.gz
Description changed:
---
+++
@@ -47,3 +47,5 @@
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ +Apply attachment: 10163_sage_cleaner.patch
Author: Jeroen Demeyer
The problem was dangling file descriptors. This patch fixes the problem for me.
Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner... In any case, the patch looks good to me.
Reviewer: Volker Braun
Replying to @vbraun:
Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner...
Exactly, that's what I was also thinking.
Changed keywords from build Makefile to build Makefile sage-cleaner
Changed keywords from build Makefile sage-cleaner to build Makefile sage-cleaner cleaner
We had similar problems with (e.g.) ptestlong
and (GAP?) orphans IIRC; while sage -tp ...
did terminate ("All tests passed!"), tee
did not because some file descriptor(s) were still open, so you didn't get the shell prompt back.
Killing the orphans the shell prompt returned.
Merged: sage-4.7.2.alpha4
The following is a non-reproducible problem, but it happened several times. When building the Sage documentation using
it can happen that
make
simply hangs forever after the messagePressing CTRL-C at this point gives
Here is a
ps
snapshot of a situation where it happened:Note the orphaned
sage-cleaner
process with the same session ID asmake doc-html
.If it matters:
Apply attachment: 10163_sage_cleaner.patch
Component: build
Keywords: build Makefile sage-cleaner cleaner
Author: Jeroen Demeyer
Reviewer: Volker Braun
Merged: sage-4.7.2.alpha4
Issue created by migration from https://trac.sagemath.org/ticket/10163