sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.39k stars 473 forks source link

parallelism in Sage: just use value of 'MAKE' #12016

Closed jhpalmieri closed 12 years ago

jhpalmieri commented 12 years ago

The various parallel aspects of Sage should be controlled by setting the -j (possible also -l) flags in MAKE or MAKEFLAGS. That is, if MAKE='make -j16', then

Testing this ticket: you can set the environment variable SAGE_NUM_CORES to the number of cores you want to pretend to have. For example, running

SAGE_NUM_CORES=24 make ptestlong

should run 8 threads (see sage-num-threads.py; this is undocumented because the only purpose I see is for testing this ticket).

Notes: With the patches applied, building spkgs in parallel works well, except for race conditions in:

Apply:

  1. attachment: 12016-root.patch to the SAGE_ROOT repository.
  2. attachment: 12016-base.patch to spkg/base.
  3. attachment: 12016-scripts.patch and attachment: trac_12016-scripts-ref.patch to the SCRIPTS repository.
  4. attachment: 12016-sage.patch to the Sage library.

See also: #6495 to implement the same behavior for doc building.

Dependencies: sage-4.8.alpha4

CC: @jdemeyer @nexttime

Component: build

Author: John Palmieri, Jeroen Demeyer

Reviewer: John Palmieri, Jeroen Demeyer

Merged: sage-4.8.alpha5

Issue created by migration from https://trac.sagemath.org/ticket/12016

jdemeyer commented 12 years ago
comment:1

We should remove NUM_THREADS from the top-level Makefile.

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -6,3 +6,7 @@

 In #6495, we should implement the same behavior for doc building.

+**Apply**:
+1. [attachment: 12016-root.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
+2. [attachment: trac_12016-scripts.patch](https://github.com/sagemath/sage/files/ticket12016/trac_12016-scripts.patch.gz) to the `SCRIPTS` repository.
+3. [attachment: 12016-sage.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-sage.v2.patch.gz) to the Sage library.
jdemeyer commented 12 years ago

Changed author from John Palmieri to John Palmieri, Jeroen Demeyer

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -2,7 +2,9 @@

 - running `make` will build spkg's in parallel, using 16 processes (this was done in #11959)

-- running `make ptestlong` or `sage -tp 0 <files>` will doctest in parallel using 16 threads.  If the `-j` flag in `MAKE` is empty or not set, then determine the number of threads as before: `min(8, cpu_count())`.
+- running `make ptestlong` or `sage -tp 0 <files>` will doctest in parallel using 16 threads.  If the `-j` flag in `MAKE` is not set, then determine the number of threads as before: `min(8, cpu_count())`.
+
+- TODO: running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.

 In #6495, we should implement the same behavior for doc building.
jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -10,5 +10,5 @@

 **Apply**:
 1. [attachment: 12016-root.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
-2. [attachment: trac_12016-scripts.patch](https://github.com/sagemath/sage/files/ticket12016/trac_12016-scripts.patch.gz) to the `SCRIPTS` repository.
+2. [attachment: 12016-scripts.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-scripts.v2.patch.gz) to the `SCRIPTS` repository.
 3. [attachment: 12016-sage.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-sage.v2.patch.gz) to the Sage library.
jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,10 +1,10 @@
 With the attached patches, along with the changes from #11959, the various parallel aspects of Sage should be controlled by setting the `-j` flag in `MAKE`.  That is, if `MAKE='make -j16'`, then

-- running `make` will build spkg's in parallel, using 16 processes (this was done in #11959)
+- running `make` will build spkg's in parallel, using 16 processes (this was done in #11959).  *This is standard `make` behaviour, no patches are needed.*

 - running `make ptestlong` or `sage -tp 0 <files>` will doctest in parallel using 16 threads.  If the `-j` flag in `MAKE` is not set, then determine the number of threads as before: `min(8, cpu_count())`.

-- TODO: running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.
+- **TODO**: running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.

 In #6495, we should implement the same behavior for doc building.
jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -8,6 +8,11 @@

 In #6495, we should implement the same behavior for doc building.

+**TODO:** We should also support:
+- `make -j16 ptestlong` (as opposed to `MAKE="make -j16" make ptestlong`).  Use `MAKEFLAGS`.
+- `make -j` (unlimited number of jobs, say set to 9999 if we really need a number).
+- `make -j37 -j1` (last option takes precedence).
+
 **Apply**:
 1. [attachment: 12016-root.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
 2. [attachment: 12016-scripts.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-scripts.v2.patch.gz) to the `SCRIPTS` repository.
jdemeyer commented 12 years ago
comment:6

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES (which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

jhpalmieri commented 12 years ago
comment:7

Replying to @jdemeyer:

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES

Sounds okay.

(which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

If you run "sage -tp ", should you use 1 process or more than 1? The "-tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.

jhpalmieri commented 12 years ago
comment:8

For something like make -j16 ptestlong, how do we recover the number 16? If I execute this command (with MAKE unset), I see

MAKEFLAGS= --jobserver-fds=3,4 -j
MFLAGS=- --jobserver-fds=3,4 -j

but I don't see "16" anywhere in the listing of the environment variables.

jdemeyer commented 12 years ago
comment:9

Replying to @jhpalmieri:

Replying to @jdemeyer:

John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sage-sage or sage-env to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES

Sounds okay.

(which the user could set by hand; if not set, the value comes from MAKE or MAKEFLAGS; if no -j option is given, set to 1).

If you run "sage -tp ", should you use 1 process or more than 1? The "-tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.

Sure, that is what I meant. We should compute the value once, but in sage -tp we can still decide to use the number of processes.

jdemeyer commented 12 years ago
comment:10

Replying to @jhpalmieri:

For something like make -j16 ptestlong, how do we recover the number 16? If I execute this command (with MAKE unset), I see

MAKEFLAGS= --jobserver-fds=3,4 -j
MFLAGS=- --jobserver-fds=3,4 -j

but I don't see "16" anywhere in the listing of the environment variables.

You are right. I had not tried this before. So let's scrap that idea.

jhpalmieri commented 12 years ago

Attachment: trac_12016-root.v2.patch.gz

Attachment: trac_12016-sage.v2.patch.gz

jhpalmieri commented 12 years ago

Description changed:

--- 
+++ 
@@ -4,16 +4,12 @@

 - running `make ptestlong` or `sage -tp 0 <files>` will doctest in parallel using 16 threads.  If the `-j` flag in `MAKE` is not set, then determine the number of threads as before: `min(8, cpu_count())`.

-- **TODO**: running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.
+- running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.

 In #6495, we should implement the same behavior for doc building.

-**TODO:** We should also support:
-- `make -j16 ptestlong` (as opposed to `MAKE="make -j16" make ptestlong`).  Use `MAKEFLAGS`.
-- `make -j` (unlimited number of jobs, say set to 9999 if we really need a number).
-- `make -j37 -j1` (last option takes precedence).
+**Apply**:
+1. [attachment: trac_12016-root.v2.patch](https://github.com/sagemath/sage-prod/files/10653998/trac_12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
+2. [attachment: trac_12016-scripts.v2.patch](https://github.com/sagemath/sage-prod/files/10654000/trac_12016-scripts.v2.patch.gz) to the `SCRIPTS` repository.
+3. [attachment: trac_12016-sage.v2.patch](https://github.com/sagemath/sage-prod/files/10653999/trac_12016-sage.v2.patch.gz) to the Sage library.

-**Apply**:
-1. [attachment: 12016-root.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
-2. [attachment: 12016-scripts.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-scripts.v2.patch.gz) to the `SCRIPTS` repository.
-3. [attachment: 12016-sage.v2.patch](https://github.com/sagemath/sage/files/ticket12016/12016-sage.v2.patch.gz) to the Sage library.
jhpalmieri commented 12 years ago
comment:11

Here are new patches. These use SAGE_NUM_THREADS if it is set, and otherwise try to extract a number from MAKE. (My method for doing this is probably not ideal, but the options This is done in sage-env. Running sage -b should use this setting now, also.

I don't know how to get the number of threads from

make -j16 ptestlong

so I removed that from the "to do" list in the ticket description.

In the file sage-ptest, I removed the "FIXME" comment in

    try:
        # FIXME: Nice, but <NUMTHREADS> should immediately follow '-tp' etc.,
        #        i.e., be the next argument. We might have file or directory
        #        names that properly convert to an int...
        numthreads = int(argv[1])
        infiles = argv[2:]
    except ValueError: # can't convert first arg to an integer: arg was probably omitted
        numthreads = 1

The script sage-ptest doesn't get a "tp" argument; it is instead called by sage-sage, and the way it is called, the first argument to sage-ptest is precisely what ever came after "-tp". So I don't think anything needs fixing. If we ever rewrite sage-sage (#21) to properly parse arguments, we can make sure that "-tp" has a default numerical argument of zero.

jdemeyer commented 12 years ago
comment:12

Attachment: trac_12016-scripts.v2.patch.gz

1) If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.

1b) Alternatively: use 0 for automatic (as is sage -tp 0) and 999999 for unlimited. This would mean less special-case code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).

2) In sage-ptest, unlimited really should be unlimited. Not max(8, # of cpus).

3) We should also do the following long-needed fix here: setting MAKE to make -j16 is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should use MAKEFLAGS instead (similar to the distinction between CC and CFLAGS). This is why you often see an error like "make -jN forced in sub-make. Disabling job server mode" (freely quoted from my mind). So, when MAKEFLAGS exists, assume that make understands the flags and do not pass flags in MAKE.

4) Why did you change

sage-build "$@" || exit $?

to

sage-build "$@"

in the sage_build() function in sage-sage?

5) You reverted a lot of changes that I made to doc/en/developer/doctesting.rst. Why? I actually tried all the examples in the documentation and pasted the exact output I got (on sage.math.washington.edu). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.

I am planning to work further on this, so don't change any code yet. But please give your opinion.

jhpalmieri commented 12 years ago
comment:13

Replying to @jdemeyer:

1) If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.

1b) Alternatively: use 0 for automatic (as is sage -tp 0) and 999999 for unlimited. This would mean less special-case code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).

Sounds good to me.

2) In sage-ptest, unlimited really should be unlimited. Not max(8, # of cpus).

Okay.

3) We should also do the following long-needed fix here: setting MAKE to make -j16 is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should use MAKEFLAGS instead (similar to the distinction between CC and CFLAGS). This is why you often see an error like "make -jN forced in sub-make. Disabling job server mode" (freely quoted from my mind). So, when MAKEFLAGS exists, assume that make understands the flags and do not pass flags in MAKE.

I'm willing to try that, especially if you write the patch instead of me :)

4) Why did you change

sage-build "$@" || exit $?

to

sage-build "$@"

in the sage_build() function in sage-sage?

That was a mistake.

5) You reverted a lot of changes that I made to doc/en/developer/doctesting.rst. Why? I actually tried all the examples in the documentation and pasted the exact output I got (on sage.math.washington.edu). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library". So I started from scratch, at which point I just put in the changes that I felt were relevant to the ticket or easy for me to change. Probably I should have started with your patch and added the section (with modifications) back in.

It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).

jhpalmieri commented 12 years ago
comment:14

Replying to @jhpalmieri:

It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).

See #12069.

jdemeyer commented 12 years ago
comment:15

Replying to @jhpalmieri:

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".

I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?

jhpalmieri commented 12 years ago
comment:16

Replying to @jdemeyer:

Replying to @jhpalmieri:

Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".

I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?

Okay, sounds fine to me.

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -9,7 +9,7 @@
 In #6495, we should implement the same behavior for doc building.

 **Apply**:
-1. [attachment: trac_12016-root.v2.patch](https://github.com/sagemath/sage-prod/files/10653998/trac_12016-root.v2.patch.gz) to the `SAGE_ROOT` repository.
-2. [attachment: trac_12016-scripts.v2.patch](https://github.com/sagemath/sage-prod/files/10654000/trac_12016-scripts.v2.patch.gz) to the `SCRIPTS` repository.
-3. [attachment: trac_12016-sage.v2.patch](https://github.com/sagemath/sage-prod/files/10653999/trac_12016-sage.v2.patch.gz) to the Sage library.
+1. [attachment: 12016-root.patch](https://github.com/sagemath/sage-prod/files/10654001/12016-root.patch.gz) to the `SAGE_ROOT` repository.
+2. [attachment: 12016-scripts.patch](https://github.com/sagemath/sage-prod/files/10654003/12016-scripts.patch.gz) to the `SCRIPTS` repository.
+3. [attachment: 12016-sage.patch](https://github.com/sagemath/sage-prod/files/10654002/12016-sage.patch.gz) to the Sage library.
jdemeyer commented 12 years ago
comment:17

The essence of the patch should be there. I'm not claiming it works, I still need to test many things.

jdemeyer commented 12 years ago

Changed dependencies from #11969 to #11969, #12096

jdemeyer commented 12 years ago

Changed dependencies from #11969, #12096 to #11969, #12096, #12098

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -13,3 +13,8 @@
 2. [attachment: 12016-scripts.patch](https://github.com/sagemath/sage-prod/files/10654003/12016-scripts.patch.gz) to the `SCRIPTS` repository.
 3. [attachment: 12016-sage.patch](https://github.com/sagemath/sage-prod/files/10654002/12016-sage.patch.gz) to the Sage library.

+**Notes**:
+With the patches applied, building spkgs in parallel works well, except for a "jobserver unavailable" warning in:
+* ntl
+* singular
+* rubiks
jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -10,8 +10,9 @@

 **Apply**:
 1. [attachment: 12016-root.patch](https://github.com/sagemath/sage-prod/files/10654001/12016-root.patch.gz) to the `SAGE_ROOT` repository.
-2. [attachment: 12016-scripts.patch](https://github.com/sagemath/sage-prod/files/10654003/12016-scripts.patch.gz) to the `SCRIPTS` repository.
-3. [attachment: 12016-sage.patch](https://github.com/sagemath/sage-prod/files/10654002/12016-sage.patch.gz) to the Sage library.
+2. [attachment: 12016-base.patch](https://github.com/sagemath/sage-prod/files/10654004/12016-base.patch.gz) to `spkg/base`.
+3. [attachment: 12016-scripts.patch](https://github.com/sagemath/sage-prod/files/10654003/12016-scripts.patch.gz) to the `SCRIPTS` repository.
+4. [attachment: 12016-sage.patch](https://github.com/sagemath/sage-prod/files/10654002/12016-sage.patch.gz) to the Sage library.

 **Notes**:
 With the patches applied, building spkgs in parallel works well, except for a "jobserver unavailable" warning in:
jhpalmieri commented 12 years ago
comment:21

In spkg/standard/deps, why prefix most of the rules with "+"?

jdemeyer commented 12 years ago
comment:22

Replying to @jhpalmieri:

In spkg/standard/deps, why prefix most of the rules with "+"?

To mark the rule as "recursive", see http://www.gnu.org/software/make/manual/make.html#Recursion. Otherwise you get lots of warnings like

make[2]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.
jdemeyer commented 12 years ago

Changed dependencies from #11969, #12096, #12098 to sage-4.8.alpha3 + #12096

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,6 @@
 With the attached patches, along with the changes from #11959, the various parallel aspects of Sage should be controlled by setting the `-j` flag in `MAKE`.  That is, if `MAKE='make -j16'`, then

-- running `make` will build spkg's in parallel, using 16 processes (this was done in #11959).  *This is standard `make` behaviour, no patches are needed.*
+- running `make` will build spkg's in parallel, using 16 processes (this was done in #11959).  This is standard `make` behaviour, but we need to patch `spkg/standard/deps` to ensure that `make` recognizes that we are doing a recursive make.

 - running `make ptestlong` or `sage -tp 0 <files>` will doctest in parallel using 16 threads.  If the `-j` flag in `MAKE` is not set, then determine the number of threads as before: `min(8, cpu_count())`.
jdemeyer commented 12 years ago
comment:25

With these patches applied, I often (not always) get the following doctest error in sage0.py:

sage -t  -force_lib devel/sage/sage/interfaces/sage0.py
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/devel/sage-main/sage/interfaces/sage0.py", line 448:
    sage: F == sage0(F)._sage_()
Exception raised:
    Traceback (most recent call last):
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_20[4]>", line 1, in <module>
        F == sage0(F)._sage_()###line 448:
    sage: F == sage0(F)._sage_()
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/lib/python/site-packages/sage/interfaces/sage0.py", line 458, in _sage_
        return load(P._local_tmpfile())
      File "sage_object.pyx", line 775, in sage.structure.sage_object.load (sage/structure/sage_object.c:7937)
    IOError: [Errno 2] No such file or directory: '/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/home/.sage//temp/sage.math.washington.edu/4956//interface//tmp5116.sobj'
**********************************************************************
[...]
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/devel/sage-main/sage/interfaces/sage0.py", line 547:
    sage: sage0_version() == version()
Expected:
    True
Got:
    False
**********************************************************************
[...]
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/devel/sage-main/sage/interfaces/sage0.py", line 176:
    sage: sage0('factor(2^157-1)')
Expected:
    852133201 * 60726444167 * 1654058017289 * 2134387368610417
Got:
    <BLANKLINE>
**********************************************************************
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/devel/sage-main/sage/interfaces/sage0.py", line 178:
    sage: print "ignore this";  sage0.cputime()     # random output
Exception raised:
    Traceback (most recent call last):
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_3[4]>", line 1, in <module>
        print "ignore this";  sage0.cputime()     # random output###line 178:
    sage: print "ignore this";  sage0.cputime()     # random output
      File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha3-make-jobs/local/lib/python/site-packages/sage/interfaces/sage0.py", line 185, in cputime
        return float(s)
    ValueError: invalid literal for float(): 852133201 * 60726444167 * 1654058017289 * 2134387368610417
**********************************************************************
jdemeyer commented 12 years ago

Attachment: 12016-root.patch.gz

jhpalmieri commented 12 years ago
comment:26

I installed these patches on sage.math and set MAKE='make -j12'. Then I ran make ptestlong, and the log file for the testing says "Doctesting 2857 files doing 2 jobs in parallel". Running ./sage -tp --long ... uses 12 threads, as it should.

If I remove the redirection of stderr to null in sage-env, when calling sage-num-threads, I see this:

Traceback (most recent call last):
  File "/mnt/usb1/scratch/palmieri/12016/test1/sage-4.8.X/local/bin/sage-num-threads.py", line 144, in <module>
    print "%i %i %i"%num_threads()
  File "/mnt/usb1/scratch/palmieri/12016/test1/sage-4.8.X/local/bin/sage-num-threads.py", line 95, in num_threads
    if MAKEFLAGS[0] != '-':
IndexError: string index out of range

This seems to fix the problem for me:

diff --git a/sage-num-threads.py b/sage-num-threads.py
--- a/sage-num-threads.py
+++ b/sage-num-threads.py
@@ -92,7 +92,7 @@ def num_threads():
     try:
         # Prepend hyphen to MAKEFLAGS if it does not start with one
         MAKEFLAGS=os.environ["MAKEFLAGS"]
-        if MAKEFLAGS[0] != '-':
+        if len(MAKEFLAGS) > 0 and MAKEFLAGS[0] != '-':
             MAKEFLAGS = '-' + MAKEFLAGS
         # In MAKEFLAGS, "-j" does not mean unlimited.  It probably
         # means an inherited number of jobs, let us use 2 for safety.
jhpalmieri commented 12 years ago
comment:27

Another patch:


diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -242,7 +242,8 @@ def execute_list_of_commands_in_parallel
     WARNING: commands are run roughly in order, but of course successive
     commands may be run at the same time.
     """
-    print "Execute %s commands (using %s threads)"%(len(command_list), min(len(command_list),nthreads))
+    nthreads = min(len(command_list),nthreads)
+    print "Execute %s commands (using %s threads)"%(len(command_list), nthreads)
     from multiprocessing import Pool
     import twisted.persisted.styles #doing this import will allow instancemethods to be pickable
     p = Pool(nthreads)

Without this one, if you do export MAKE='make -j' and then make ptestlong, although it says it's using 0 threads, it actually starts up 999999 threads (to do nothing, as far as I can tell).

I get lots of doctest failures and warnings using make -j, along the lines of OSError: [Errno 11] Resource temporarily unavailable and /bin/sh: Cannot fork, but I guess this is to be expected? Maybe we should put some sort of cap on the number of threads to use for doctesting?

jhpalmieri commented 12 years ago
comment:28

Replying to @jdemeyer:

With these patches applied, I often (not always) get the following doctest error in sage0.py:

How many threads were you using? I haven't seen this recently, but I think I've seen this sort of error before unrelated to this ticket. I'm not sure I would consider it an obstacle for this ticket. It would be nice to track down the problem, though.

jdemeyer commented 12 years ago
comment:29

Thanks for the feedback, these should be easy fixes.

jhpalmieri commented 12 years ago
comment:30

For this ticket or a follow-up: it would be nice to be able to set MAKE=make -j -l30, for example, and have tests pass. Building this way works, but I get lots of doctest failures as mentioned above. Maybe we could read the argument for -l for a cap on the threads for doctesting? Or use the number of cpus for this cap?

jdemeyer commented 12 years ago
comment:31

Replying to @jhpalmieri:

Maybe we could read the argument for -l for a cap on the threads for doctesting?

Should be possible, in sage-num-threads.py.

jdemeyer commented 12 years ago
comment:32

Since execute_list_of_commands_in_serial is essentially a special case of execute_list_of_commands_in_parallel, I see no reason to keep the former.

jdemeyer commented 12 years ago

Attachment: 12016-sage.patch.gz

jdemeyer commented 12 years ago
comment:33

New patch up, implements support for "-l" option to make, fixes and simplifies setup.py.

jhpalmieri commented 12 years ago
comment:34

This looks good to me. Is it ready for review? Am I allowed to review it since I wrote early drafts of some of the patches?

For a future ticket, it would be nice if you could set MAKE='make -j -lN', for some reasonable choice of N, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:

(The Sage spkg used to fail before the latest round of patches using the -l setting for a cap on the number of threads, and the same goes for parallel doctesting. Setting MAKE='make -j' still causes these failures. Should we just regard this setting as too dangerous, or try to stop the failures by putting a cap on the number of processes if -j is present but set to "unlimited" and -l is missing?)

These failures are not related to this ticket; they fail with or without the patches. But the ticket makes it more appealing to just set MAKE as above.

jdemeyer commented 12 years ago

Description changed:

--- 
+++ 
@@ -7,6 +7,14 @@
 - running `./sage -b` will build the Sage library using 16 threads. If the `-j` flag in `MAKE` is not set, then use only 1 thread.

 In #6495, we should implement the same behavior for doc building.
+
+Concerning testing this ticket: you can set the environment variable `SAGE_NUM_CORES` to the number of cores you want to pretend to have.  For example, running
+
+```
+SAGE_NUM_CORES=24 make ptestlong
+```
+should run 8 threads (see `sage-num-threads.py`; this is undocumented because the only purpose I see is for testing this ticket).
+

 **Apply**:
 1. [attachment: 12016-root.patch](https://github.com/sagemath/sage-prod/files/10654001/12016-root.patch.gz) to the `SAGE_ROOT` repository.
jdemeyer commented 12 years ago

Reviewer: John Palmieri, Jeroen Demeyer

jdemeyer commented 12 years ago
comment:35

Replying to @jhpalmieri:

This looks good to me. Is it ready for review?

Yes, it is. I just didn't want to put "needs review" because I have not really tested it.

Am I allowed to review it since I wrote early drafts of some of the patches?

I would say yes, since I certainly looked at all your code. So consider all your code to be positively reviewed by me.

For a future ticket, it would be nice if you could set MAKE='make -j -lN', for some reasonable choice of N, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:

It should work.

  • zlib on OS X (2 cores) fails most of the time with MAKE='make -j -l3'. Here's a log.
  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Question for both cases: does MAKE="make -jN" work for various values of N? Because I don't see a fundamental difference between "make -jN" and "make -j -lN".

(The Sage spkg used to fail before the latest round of patches using the -l setting for a cap on the number of threads, and the same goes for parallel doctesting. Setting MAKE='make -j' still causes these failures. Should we just regard this setting as too dangerous?

Yes. Since "make" will simply run as many threads as it can, I think Sage should do the same, no matter how stupid that is.

By the way, I would really like to merge this in the sage-4.8 release, because it cleans up some stuff which will also help future tickets like #11073 (which hopefully will be merged in sage-5.0).

jdemeyer commented 12 years ago
comment:36

Replying to @jhpalmieri:

  • zlib on OS X (2 cores) fails most of the time with MAKE='make -j -l3'. Here's a log.
  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Since my patches properly implement parallel building, it also means that more packages are actually being built in parallel. So I think we are simply triggering bugs in the various packages. For example, I never had problems with Python before, but I did have problems with this patch (fixed in #12096).

I cannot explain why "make -j -lN" would fail but "make -jN" would work.

jdemeyer commented 12 years ago
comment:37

Replying to @jhpalmieri:

  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Well, singular is in the list of fishy packages, see the ticket description.

jhpalmieri commented 12 years ago
comment:38

Just for fun, I modified deps so singular would build all by itself in the build process (I made it depend on linbox and scipy, so it was the last package to be built before the sage package). Then it built fine using make -j -l30.

jdemeyer commented 12 years ago
comment:39

I think it is truly a coincidence that "make -j -lN" fails. I managed to make singular fail with just "make -jN", hopefully fixed by #12138.

jdemeyer commented 12 years ago
comment:40

Replying to @jhpalmieri:

This looks good to me. Is it ready for review? Am I allowed to review it since I wrote early drafts of some of the patches?

For a future ticket, it would be nice if you could set MAKE='make -j -lN', for some reasonable choice of N, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:

  • zlib on OS X (2 cores) fails most of the time with MAKE='make -j -l3'. Here's a log.
  • singular on sage.math fails all of the time, I think, with MAKE='make -j -l30'. Here's a log.

Are these about builds of the total Sage source in which these fail, or are these separate installs like sage -f ...?