sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Run doctests with OMP_NUM_THREADS=2 #23892

Closed jdemeyer closed 7 years ago

jdemeyer commented 7 years ago

The normaliz package uses OMP for threading, which can create many threads. In doctests, this is bad for two reasons:

  1. Doctests should not use an unexpectedly large number of system resources.

  2. When there are too many threads, the virtual memory limit from #23748 will be hit.

There is a solution: set the environment variable OMP_NUM_THREADS=2 while doctesting.

CC: @koffie

Component: packages: optional

Author: Jeroen Demeyer

Branch: 9f9f7b7

Reviewer: Maarten Derickx

Issue created by migration from https://trac.sagemath.org/ticket/23892

koffie commented 7 years ago
comment:1

Again??? :(

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1 +1,33 @@
-Details to follow...
+
+```
+sage -t --long --warn-long 63.7 src/sage/graphs/generic_graph.py  # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py  # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/rings/integer.pyx  # 1 doctest failed
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/pbori.pyx  # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configuration_element.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/crypto/sbox.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/graphs/base/graph_backends.pyx  # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_libsingular.pyx  # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_B.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux_element.py  # Bad exit: 2
+sage -t --long --warn-long 63.7 src/sage/crypto/boolean_function.pyx  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/kleber_tree.py  # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/affinization.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/structure/category_object.pyx  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/tensor_product_element.pyx  # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_E67.py  # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/geometry/polyhedron/backend_normaliz.py  # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/sat/solvers/dimacs.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/doc/en/reference/sat/index.rst  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py  # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/ext/memory.pyx  # 1 doctest failed
+```
+
+The cause seems to be the same as #23879: `pynormaliz` requires a very large amount of memory (more than what #23748 allows).
jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -4,7 +4,6 @@
 sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py  # Killed due to abort
 sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py  # Bad exit: 1
 sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py  # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/rings/integer.pyx  # 1 doctest failed
 sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py  # Bad exit: 1
 sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx  # Killed due to abort
 sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py  # Killed due to abort
@@ -27,7 +26,6 @@
 sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py  # Killed due to abort
 sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py  # Killed due to abort
 sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/ext/memory.pyx  # 1 doctest failed

The cause seems to be the same as #23879: pynormaliz requires a very large amount of memory (more than what #23748 allows).

jdemeyer commented 7 years ago

Author: Jeroen Demeyer

jdemeyer commented 7 years ago

Description changed:

--- 
+++ 
@@ -1,31 +1,7 @@
+The `normaliz` package uses OMP for threading, which can create many threads. In doctests, this is bad for two reasons:

-```
-sage -t --long --warn-long 63.7 src/sage/graphs/generic_graph.py  # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py  # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/pbori.pyx  # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configuration_element.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/crypto/sbox.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/graphs/base/graph_backends.pyx  # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_libsingular.pyx  # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_B.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux_element.py  # Bad exit: 2
-sage -t --long --warn-long 63.7 src/sage/crypto/boolean_function.pyx  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/kleber_tree.py  # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/affinization.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/structure/category_object.pyx  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/tensor_product_element.pyx  # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_E67.py  # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/geometry/polyhedron/backend_normaliz.py  # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/sat/solvers/dimacs.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/doc/en/reference/sat/index.rst  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py  # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py  # Killed due to abort
-```
+1. Doctests should not use an unexpectedly large number of system resources.

-The cause seems to be the same as #23879: `pynormaliz` requires a very large amount of memory (more than what #23748 allows).
+2. When there are too many threads, the virtual memory limit from #23748 will be hit.
+
+There is a solution: set the environment variable `OMP_NUM_THREADS=2` while doctesting.
jdemeyer commented 7 years ago

Branch: u/jdemeyer/run_doctests_with_omp_num_threads_2

jdemeyer commented 7 years ago

Commit: 9f9f7b7

jdemeyer commented 7 years ago

New commits:

9f9f7b7Run doctests with OMP_NUM_THREADS=2
koffie commented 7 years ago
comment:8

Hi Jeroen,

I want to review this, but before doing so I run into trouble, because on my machine all doctests pass in sage 8.1.beta5 with pynormaliz. Could you give pointers to which doctests failed for you and maybe help reproduce the failure so I can better understand wether this solution works. Also is there any particular reason for the integer 2? Why not 3 or 4? I agree it should not be 1 because certain bugs might go undetected in that way.

jdemeyer commented 7 years ago
comment:9

In particular, many tests in src/sage/combinat/rigged_configurations fail for me with pynormaliz.

Since the problem depends on the number of threads, which is the number of cores by default, it could very well be that this problem only occurs on systems with many cores. The system where I saw the failure has 24 cores. Maybe you could get the failure with OMP_NUM_THREADS=24?

jdemeyer commented 7 years ago
comment:10

Replying to @koffie:

Also is there any particular reason for the integer 2?

Yes, it is the smallest integer strictly larger than 1.

1 thread is too few, because it doesn't really test threading. With 2 threads, you do test threading. On the other hand, the system load will at most be a factor 200% too large, which is not too bad.

koffie commented 7 years ago
comment:11

Without the patch I indeed get 4 files with failing doctests if I do:

export OMP_NUM_THREADS=24
sage -t long src/sage/combinat/rigged_configurations

and it does not fail anymore with the patch. So it seems to be the right thing to do in order to fix it.

One thing that I don't like about the current patch is that it overwrites OMP_NUM_THREADS even if it is already explicitly set before running sage tests. This means that if someone for some reason wants to run the doctests with a different number of OMP_NUM_THREADS for debugging purposes that involve problems with threading then one has to modify the source code. So I think it would be better to only set OMP_NUM_THREADS=2 if nothing was set before, providing a sane default value, but still allowing a less sane default value if one really insists. What are your thoughts on this?

jdemeyer commented 7 years ago
comment:12

Replying to @koffie:

One thing that I don't like about the current patch is that it overwrites OMP_NUM_THREADS even if it is already explicitly set before running sage tests.

I consider that a feature. The point is that doctests should be reproducible and not depend too much on the external environment. If somebody has set an environment variable OMP_NUM_THREADS, the most likely reason is that he wants to use that number of threads for actual computations. It does not mean that he wants to use that many threads for doctests too.

This means that if someone for some reason wants to run the doctests with a different number of OMP_NUM_THREADS for debugging purposes that involve problems with threading then one has to modify the source code.

Alternatively, you can set os.environ['OMP_NUM_THREADS'] in a doctest too. That's easy to do and would fix the testing problem.

koffie commented 7 years ago
comment:13

Ok, I am now running all the doctest after export OMP_NUM_THREADS=100 since I think standard patchbot testing is not good enough. If this passes then I will give positive review.

Does your remark mean that it would also be better to fix #23612 (edit: sorry I meant #23613) by unsetting PYTHONPATH instead of making the doctest more admissible?

jdemeyer commented 7 years ago
comment:14

Replying to @koffie:

Does your remark mean that it would also be better to fix #23612 by unsetting PYTHONPATH instead of making the doctest more admissible?

Are you sure you mean #23612? It seems unrelated to PYTHONPATH or doctests.

koffie commented 7 years ago
comment:15

Sorry, I meant #23613.

koffie commented 7 years ago
comment:16

Ok looks good to me.

vbraun commented 7 years ago
comment:17

Reviewer name

koffie commented 7 years ago

Reviewer: Maarten Derickx

vbraun commented 7 years ago

Changed branch from u/jdemeyer/run_doctests_with_omp_num_threads_2 to 9f9f7b7

embray commented 7 years ago

Changed commit from 9f9f7b7 to none

embray commented 7 years ago
comment:20

I wonder if this and/or #23748 will fix the doc build problems I was having on Windows for the past several weeks (which caused me to have to take down the Window patchbot >_<). Fingers crossed...

embray commented 7 years ago
comment:21

Oh wait, this was just for the doctests, I misread. Maybe not then...