sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

make doc-html sometimes hangs after having finished #10163

Closed jdemeyer closed 13 years ago

jdemeyer commented 14 years ago

The following is a non-reproducible problem, but it happened several times. When building the Sage documentation using

make doc-html

it can happen that make simply hangs forever after the message

Build finished.  The built documents can be found in /mnt/usb1/scratch/jdemeyer/merger/sage-4.6.1.alpha0/devel/sage/doc/output/html/fr/a_tour_of_sage

Pressing CTRL-C at this point gives

make: *** [doc-html] Interrupt

Here is a ps snapshot of a situation where it happened:

$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args  # Irrelevant entries removed
  PID  PPID  SESS TT       TPGID COMMAND
30777     1 23411 pts/6     6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner
21614     1 21614 ?           -1 SCREEN
23411 21614 23411 pts/6     6956  \_ /bin/bash
 6956 23411 23411 pts/6     6956  |   \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
 6969  6956 23411 pts/6     6956  |       \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
 6970  6969 23411 pts/6     6956  |       |   \_ tee /home/release/sage-4.7.2.alpha4/logs/merger.log
30248  6956 23411 pts/6     6956  |       \_ make -j6 doc-html
30249 30248 23411 pts/6     6956  |           \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
30250 30249 23411 pts/6     6956  |           |   \_ tee /home/release/sage-4.7.2.alpha4/logs/dochtml.log
30747 30248 23411 pts/6     6956  |           \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
30748 30747 23411 pts/6     6956  |               \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
30750 30748 23411 pts/6     6956  |                   \_ tee -a dochtml.log

Note the orphaned sage-cleaner process with the same session ID as make doc-html.

If it matters:

$ bash --version
GNU bash, version 4.1.0(1)-release (x86_64-unknown-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Apply attachment: 10163_sage_cleaner.patch

Component: build

Keywords: build Makefile sage-cleaner cleaner

Author: Jeroen Demeyer

Reviewer: Volker Braun

Merged: sage-4.7.2.alpha4

Issue created by migration from https://trac.sagemath.org/ticket/10163

jdemeyer commented 14 years ago

Description changed:

--- 
+++ 
@@ -1,9 +1,8 @@
-The following is a non-reproducible problem, but it happened at least twice for me:
+The following is a non-reproducible problem, but it happened at least three times for me:
 when building Sage from source using

-export MAKE="make -j8" -$MAKE +make

 it can happen that `make` simply hangs forever after the message

@@ -16,5 +15,3 @@

make: *** [doc-html] Interrupt

-
-I'm not sure whether the "-j8" option has anything to do with it.
jdemeyer commented 13 years ago

Changed keywords from none to build Makefile

jdemeyer commented 13 years ago
comment:2

This might be due to NFS (I know it happened on sage.math.washington.edu but I can't remember whether it ever happened on other machines).

jdemeyer commented 13 years ago
comment:3

I am getting more and more convinced that this is an NFS issue, so proposal to close this ticket as invalid.

vbraun commented 13 years ago
comment:4

Just because NFS triggers it doesn't necessarily mean that its not a bug in Sage. I've never seen this issue, but then I never use high-latency filesystems. Can you run strace/gdb on the stuck process if you see it again? (use -p to attach to existing pid)

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -1,5 +1,5 @@
-The following is a non-reproducible problem, but it happened at least three times for me:
-when building Sage from source using
+The following is a non-reproducible problem, but it happened several times.
+When building Sage from source using

make @@ -15,3 +15,4 @@

 make: *** [doc-html] Interrupt

+The issue is a stuck sage-cleaner process.

jhpalmieri commented 13 years ago
comment:6

Should there be some timeout setting in sage-cleaner? If it has to wait too long, exit but print a warning message that there may be some stray processes which should be killed by hand?

vbraun commented 13 years ago
comment:7

No the sage cleaner needs to stick around for as long as Sage is running. There might be a computation running that, after a long while, does a big @parallel loop and if nobody is cleaning up the filesystem will be very unhappy.

jhpalmieri commented 13 years ago
comment:8

Does sage-cleaner actually run when you do sage -docbuild ... (which is called by make doc-html)? It doesn't look like it to me. It seems that sage -docbuild all html creates a bunch of temporary directories in DOT_SAGE/temp/HOSTNAME, and those aren't deleted until the next time I run Sage, when sage-cleaner actually runs. So maybe the issue on this ticket isn't related to sage-cleaner.

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -15,4 +15,28 @@

make: *** [doc-html] Interrupt

-The issue is a stuck `sage-cleaner` process.
+
+Here is a `ps` snapshot of a situation where it happened:
+
+```
+$ ps -u jdemeyer xfo pid,ppid,sess,tty,args
+  PID  PPID  SESS TT       COMMAND
+26136 24879 24879 ?        sshd: jdemeyer@pts/25
+26308 26136 26308 pts/25    \_ -bash
+26434 26308 26308 pts/25        \_ screen -r
+30777     1 23411 pts/6    python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner
+21614     1 21614 ?        SCREEN
+21615 21614 21615 pts/24    \_ /bin/bash
+21623 21615 21615 pts/24    |   \_ top
+23411 21614 23411 pts/6     \_ /bin/bash
+ 6956 23411 23411 pts/6     |   \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+ 6969  6956 23411 pts/6     |       \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+ 6970  6969 23411 pts/6     |       |   \_ tee /home/release/sage-4.7.2.alpha4/logs/merger.log
+30248  6956 23411 pts/6     |       \_ make -j6 doc-html
+30249 30248 23411 pts/6     |           \_ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"
+30250 30249 23411 pts/6     |           |   \_ tee /home/release/sage-4.7.2.alpha4/logs/dochtml.log
+30747 30248 23411 pts/6     |           \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
+30748 30747 23411 pts/6     |               \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
+30750 30748 23411 pts/6     |                   \_ tee -a dochtml.log
+```
+Note the orphaned `sage-cleaner` process.
jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -19,24 +19,24 @@
 Here is a `ps` snapshot of a situation where it happened:

-$ ps -u jdemeyer xfo pid,ppid,sess,tty,args

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -21,13 +21,8 @@

$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args PID PPID SESS TT TPGID COMMAND -26136 24879 24879 ? -1 sshd: jdemeyer@pts/25 -26308 26136 26308 pts/25 26434 _ -bash -26434 26308 26308 pts/25 26434 _ screen -r 30777 1 23411 pts/6 6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner 21614 1 21614 ? -1 SCREEN -21615 21614 21615 pts/24 21623 _ /bin/bash -21623 21615 21615 pts/24 21623 | _ top 23411 21614 23411 pts/6 6956 _ /bin/bash 6956 23411 23411 pts/6 6956 | _ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge" 6969 6956 23411 pts/6 6956 | _ bash -c source "/home/jdemeyer/merger3/main.sh"; source "./merge"

vbraun commented 13 years ago
comment:12

Can you also add the content of ~/.sage/temp/HOSTNAME/ to the bug report? The sage cleaner lives on until it is empty...

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -19,7 +19,7 @@
 Here is a `ps` snapshot of a situation where it happened:

-$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args +$ ps -u jdemeyer xfo pid,ppid,sess,tty,tpgid,args # Irrelevant entries removed PID PPID SESS TT TPGID COMMAND 30777 1 23411 pts/6 6956 python /mnt/usb1/scratch/jdemeyer/merger/sage-4.7.2.alpha4/local/bin/sage-cleaner 21614 1 21614 ? -1 SCREEN @@ -33,5 +33,14 @@ 30747 30248 23411 pts/6 6956 | _ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html 2>&1" "tee -a dochtml.log" 30748 30747 23411 pts/6 6956 | _ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html 2>&1 tee -a dochtml.log 30750 30748 23411 pts/6 6956 | _ tee -a dochtml.log

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -35,12 +35,3 @@
 30750 30748 23411 pts/6     6956  |                   \_ tee -a dochtml.log
  5454

-Note the orphaned sage-cleaner process.

- -$ ls -F ~/.sage/temp/sage.math.washington.edu/ -19769/ 20243/ 20643/ 21176/ 30653/ 3678/ 4272/ 4335/ 4385/ 4425/ 5168/ 5235/ 5522/ -19942/ 20359/ 20860/ 21288/ 30822/ 3828/ 4291/ 4351/ 4400/ 4445/ 5187/ 5307/ -20207/ 20517/ 20978/ 30444/ 30880/ 3966/ 4320/ 4370/ 4406/ 5149/ 5210/ 5454/ - -Each of these directories seems to contain only an interface directory, but there are no files.

jdemeyer commented 13 years ago
comment:14

It seems to have pulled itself through after a while. Don't know why...

jdemeyer commented 13 years ago
comment:15

Replying to @vbraun:

Can you also add the content of ~/.sage/temp/HOSTNAME/ to the bug report? The sage cleaner lives on until it is empty...

What do mean with "empty"? Does this mean that, if I have 10 copies of Sage running, the Sage cleaner will only exit if all those 10 copies of Sage exit?

vbraun commented 13 years ago
comment:16

I think so. Its purpose is to delete the ~/.sage/temp/HOSTNAME/PID directory once the process with PID has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really, make doc should not wait for the sage cleaner process to finish...

jdemeyer commented 13 years ago
comment:17

Replying to @vbraun:

I think so. Its purpose is to delete the ~/.sage/temp/HOSTNAME/PID directory once the process with PID has ended. Also, note that if you start Sage multiple times only one sage-cleaner process is started. Really, make doc should not wait for the sage cleaner process to finish...

This is probably because of file descriptors being open.

jdemeyer commented 13 years ago
comment:18

I will do some more controlled tests on geom.math.washington.edu, a machine where I normally don't run Sage.

jhpalmieri commented 13 years ago
comment:19

I still don't know why sage-cleaner is running at all if you're doing make doc-html. That runs sage -docbuild, which runs this (from sage-sage):

if [ "$1" = "-docbuild" -o "$1" = "--docbuild" ]; then
   cd "$CUR"
   shift
   python "$SAGE_ROOT"/devel/sage/doc/common/builder.py $@
   exit $?
fi

It doesn't run sage-cleaner or sage_setup, so where does sage-cleaner get started?

Are you running another copy of Sage at the same time?

jdemeyer commented 13 years ago
comment:20

Replying to @jhpalmieri:

Are you running another copy of Sage at the same time?

Yes, that's why I said I will do more controlled tests. However, the fact that the session ID of the sage-cleaner is the same as make doc-html indicates that sage-cleaner might have been started by the docbuilder. Are you sure that the docbuilder doesn't run Sage?

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -35,3 +35,15 @@
 30750 30748 23411 pts/6     6956  |                   \_ tee -a dochtml.log
  5454

+ +If it matters: + + +$ bash --version +GNU bash, version 4.1.0(1)-release (x86_64-unknown-linux-gnu) +Copyright (C) 2009 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> + +This is free software; you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +

jhpalmieri commented 13 years ago
comment:22

I'm not sure, but I don't see why sage-cleaner would start, and if I run "sage -docbuild html all" or "make doc-html" in one window, ps ux | grep clean comes up empty while it's running.

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -1,8 +1,8 @@
 The following is a non-reproducible problem, but it happened several times.
-When building Sage from source using
+When building the Sage documentation using

-make +make doc-html

 it can happen that `make` simply hangs forever after the message

@@ -33,8 +33,8 @@
 30747 30248 23411 pts/6     6956  |           \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
 30748 30747 23411 pts/6     6956  |               \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
 30750 30748 23411 pts/6     6956  |                   \_ tee -a dochtml.log
- 5454

+Note the orphaned sage-cleaner process with the same session ID as make doc-html.

If it matters:

jdemeyer commented 13 years ago
comment:24

Note that it's still only a conjecture that sage-cleaner has anything to do with it. I'm putting some hooks into sage-cleaner and see what that gives. Another thing to consider is that maybe it only happens the very first time documentation is built.

jdemeyer commented 13 years ago
comment:25

sphinx-build does start sage-cleaner:

$ ps -u jdemeyer xfo pid,ppid,pgid,sess,tty,tpgid,stat,args
  PID  PPID  PGID  SESS TT       TPGID STAT COMMAND
 2811     1  2262 26064 pts/3    26064 T    python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-cleaner
 2830 24409  2829 24409 pts/6     2829 S+    |   \_ grep -e pts/3 -e COMMAND
26064  6198 26064 26064 pts/3    26064 Ss+   \_ /bin/bash
 2262 26064  2262 26064 pts/3    26064 T     |   \_ make doc-html
 2781  2262  2262 26064 pts/3    26064 T     |       \_ /bin/sh -c spkg/pipestatus "./sage -docbuild --no-pdf-links all html  2>&1" "tee -a dochtml.log"
 2782  2781  2262 26064 pts/3    26064 T     |           \_ bash spkg/pipestatus ./sage -docbuild --no-pdf-links all html  2>&1 tee -a dochtml.log
 2783  2782  2262 26064 pts/3    26064 T     |               \_ bash ./sage -docbuild --no-pdf-links all html
 2791  2783  2262 26064 pts/3    26064 T     |               |   \_ bash /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sage-sage -docbuild --no-pdf-links all html
 2806  2791  2262 26064 pts/3    26064 T     |               |       \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/common/builder.py --no-pdf-links all html
 2807  2806  2262 26064 pts/3    26064 T     |               |           \_ /bin/sh -c sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial   -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
 2808  2807  2262 26064 pts/3    26064 T     |               |               \_ python /scratch/jdemeyer/sage-4.7.2.alpha3/local/bin/sphinx-build -b html -d /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/doctrees/de/tutorial -A hide_pdf_links=1 /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/de/tutorial /scratch/jdemeyer/sage-4.7.2.alpha3/devel/sage/doc/output/html/de/tutorial
 2784  2782  2262 26064 pts/3    26064 T     |               \_ tee -a dochtml.log
jdemeyer commented 13 years ago
comment:26

Frustrating that it is hard to reproduce: make doc-html sometimes runs sage-cleaner.

jdemeyer commented 13 years ago
comment:27

This is sphinx-build:

#!/usr/bin/env python
# EASY-INSTALL-ENTRY-SCRIPT: 'Sphinx==1.0.4','console_scripts','sphinx-build'
__requires__ = 'Sphinx==1.0.4'
import sys
from pkg_resources import load_entry_point
import sage.all
sys.exit(
   load_entry_point('Sphinx==1.0.4', 'console_scripts', 'sphinx-build')()
)

Note the line "import sage.all". Perhaps this can sometimes (indirectly) run sage-cleaner?

vbraun commented 13 years ago
comment:28

I don't see how the import sage.all could start sage-cleaner. If you can reproduce the problem, how about you strace the docbuild process? Then it would be easy to find out which child process starts the cleaner.

jdemeyer commented 13 years ago
comment:29

Replying to @vbraun:

If you can reproduce the problem

That's precisely the problem, it must be some weird race-condition.

jdemeyer commented 13 years ago
comment:30

I found a way to reproduce it: when $HOME/.sage does not exist, sage-cleaner is always started when doing make doc-html.

jdemeyer commented 13 years ago
comment:31

It is indeed the line import sage.all which causes sage-cleaner to be run.

jdemeyer commented 13 years ago

Attachment: 10163_debug_sage_cleaner.patch.gz

STOP all processes when sage-cleaner is run, do not apply

jdemeyer commented 13 years ago
comment:32

gap seems to be involved somehow. The last thing which happens before starting sage-cleaner is starting gap.

vbraun commented 13 years ago
comment:33

But why does make hang until the cleaner exits? The sage-cleaner process is already orphaned, so nobody should be waiting for it. Did you strace the hanging make process? I would tend to think that it is a good thing that the sage-cleaner is started.

jdemeyer commented 13 years ago
comment:34

This reproduces the problem:

$ ( export HOME=/tmp/sagehome; rm -rf $HOME; mkdir -p $HOME/.sage/temp/`hostname`/$$; make doc-html; )

What happens here is that we start from a clean $HOME and create a bogus directory in $HOME/.sage/temp/$hostname to make sure sage-cleaner does not exit. Note that you can delete the last 3 lines from local/bin/sphinx-build to speed up testing.

jdemeyer commented 13 years ago

Attachment: 10163_sage_cleaner.patch.gz

jdemeyer commented 13 years ago

Description changed:

--- 
+++ 
@@ -47,3 +47,5 @@
 This is free software; you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.

+ +Apply attachment: 10163_sage_cleaner.patch

jdemeyer commented 13 years ago

Author: Jeroen Demeyer

jdemeyer commented 13 years ago
comment:36

The problem was dangling file descriptors. This patch fixes the problem for me.

vbraun commented 13 years ago
comment:37

Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner... In any case, the patch looks good to me.

vbraun commented 13 years ago

Reviewer: Volker Braun

jdemeyer commented 13 years ago
comment:38

Replying to @vbraun:

Hmm maybe sage-cleaner should close its stdout/stderr instead of hoping that all instances that call it do. But then that would make it annoying to debug the cleaner...

Exactly, that's what I was also thinking.

jdemeyer commented 13 years ago

Changed keywords from build Makefile to build Makefile sage-cleaner

jdemeyer commented 13 years ago

Changed keywords from build Makefile sage-cleaner to build Makefile sage-cleaner cleaner

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 13 years ago
comment:40

We had similar problems with (e.g.) ptestlong and (GAP?) orphans IIRC; while sage -tp ... did terminate ("All tests passed!"), tee did not because some file descriptor(s) were still open, so you didn't get the shell prompt back.

Killing the orphans the shell prompt returned.

jdemeyer commented 13 years ago

Merged: sage-4.7.2.alpha4