Closed jdemeyer closed 7 years ago
Again??? :(
Description changed:
---
+++
@@ -1 +1,33 @@
-Details to follow...
+
+```
+sage -t --long --warn-long 63.7 src/sage/graphs/generic_graph.py # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/rings/integer.pyx # 1 doctest failed
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/pbori.pyx # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configuration_element.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/crypto/sbox.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/graphs/base/graph_backends.pyx # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_libsingular.pyx # Killed due to segmentation fault
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_B.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux_element.py # Bad exit: 2
+sage -t --long --warn-long 63.7 src/sage/crypto/boolean_function.pyx # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/kleber_tree.py # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/affinization.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/structure/category_object.pyx # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/combinat/crystals/tensor_product_element.pyx # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_E67.py # Bad exit: 1
+sage -t --long --warn-long 63.7 src/sage/geometry/polyhedron/backend_normaliz.py # Bad exit: 127
+sage -t --long --warn-long 63.7 src/sage/sat/solvers/dimacs.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/doc/en/reference/sat/index.rst # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py # Killed due to abort
+sage -t --long --warn-long 63.7 src/sage/ext/memory.pyx # 1 doctest failed
+```
+
+The cause seems to be the same as #23879: `pynormaliz` requires a very large amount of memory (more than what #23748 allows).
Description changed:
---
+++
@@ -4,7 +4,6 @@
sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py # Killed due to abort
sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py # Bad exit: 1
sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/rings/integer.pyx # 1 doctest failed
sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py # Bad exit: 1
sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx # Killed due to abort
sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py # Killed due to abort
@@ -27,7 +26,6 @@
sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py # Killed due to abort
sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py # Killed due to abort
sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/ext/memory.pyx # 1 doctest failed
The cause seems to be the same as #23879: pynormaliz
requires a very large amount of memory (more than what #23748 allows).
Author: Jeroen Demeyer
Description changed:
---
+++
@@ -1,31 +1,7 @@
+The `normaliz` package uses OMP for threading, which can create many threads. In doctests, this is bad for two reasons:
-```
-sage -t --long --warn-long 63.7 src/sage/graphs/generic_graph.py # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/crypto/mq/sr.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configurations.py # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/kirillov_reshetikhin.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/structure/sage_object.pyx # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_sequence.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/pbori.pyx # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/rigged_configuration_element.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/crypto/sbox.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/graphs/base/graph_backends.pyx # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/multi_polynomial_libsingular.pyx # Killed due to segmentation fault
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_B.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/tensor_product_kr_tableaux_element.py # Bad exit: 2
-sage -t --long --warn-long 63.7 src/sage/crypto/boolean_function.pyx # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/kleber_tree.py # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/affinization.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/structure/category_object.pyx # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/combinat/crystals/tensor_product_element.pyx # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/combinat/rigged_configurations/bij_type_E67.py # Bad exit: 1
-sage -t --long --warn-long 63.7 src/sage/geometry/polyhedron/backend_normaliz.py # Bad exit: 127
-sage -t --long --warn-long 63.7 src/sage/sat/solvers/dimacs.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/doc/en/reference/sat/index.rst # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/sat/boolean_polynomials.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/rings/polynomial/polynomial_ring_constructor.py # Killed due to abort
-sage -t --long --warn-long 63.7 src/sage/sat/converters/polybori.py # Killed due to abort
-```
+1. Doctests should not use an unexpectedly large number of system resources.
-The cause seems to be the same as #23879: `pynormaliz` requires a very large amount of memory (more than what #23748 allows).
+2. When there are too many threads, the virtual memory limit from #23748 will be hit.
+
+There is a solution: set the environment variable `OMP_NUM_THREADS=2` while doctesting.
Hi Jeroen,
I want to review this, but before doing so I run into trouble, because on my machine all doctests pass in sage 8.1.beta5 with pynormaliz. Could you give pointers to which doctests failed for you and maybe help reproduce the failure so I can better understand wether this solution works. Also is there any particular reason for the integer 2? Why not 3 or 4? I agree it should not be 1 because certain bugs might go undetected in that way.
In particular, many tests in src/sage/combinat/rigged_configurations
fail for me with pynormaliz
.
Since the problem depends on the number of threads, which is the number of cores by default, it could very well be that this problem only occurs on systems with many cores. The system where I saw the failure has 24 cores. Maybe you could get the failure with OMP_NUM_THREADS=24
?
Replying to @koffie:
Also is there any particular reason for the integer 2?
Yes, it is the smallest integer strictly larger than 1.
1 thread is too few, because it doesn't really test threading. With 2 threads, you do test threading. On the other hand, the system load will at most be a factor 200% too large, which is not too bad.
Without the patch I indeed get 4 files with failing doctests if I do:
export OMP_NUM_THREADS=24
sage -t long src/sage/combinat/rigged_configurations
and it does not fail anymore with the patch. So it seems to be the right thing to do in order to fix it.
One thing that I don't like about the current patch is that it overwrites OMP_NUM_THREADS
even if it is already explicitly set before running sage tests. This means that if someone for some reason wants to run the doctests with a different number of OMP_NUM_THREADS
for debugging purposes that involve problems with threading then one has to modify the source code. So I think it would be better to only set OMP_NUM_THREADS=2
if nothing was set before, providing a sane default value, but still allowing a less sane default value if one really insists. What are your thoughts on this?
Replying to @koffie:
One thing that I don't like about the current patch is that it overwrites
OMP_NUM_THREADS
even if it is already explicitly set before running sage tests.
I consider that a feature. The point is that doctests should be reproducible and not depend too much on the external environment. If somebody has set an environment variable OMP_NUM_THREADS
, the most likely reason is that he wants to use that number of threads for actual computations. It does not mean that he wants to use that many threads for doctests too.
This means that if someone for some reason wants to run the doctests with a different number of
OMP_NUM_THREADS
for debugging purposes that involve problems with threading then one has to modify the source code.
Alternatively, you can set os.environ['OMP_NUM_THREADS']
in a doctest too. That's easy to do and would fix the testing problem.
Ok, I am now running all the doctest after export OMP_NUM_THREADS=100
since I think standard patchbot testing is not good enough. If this passes then I will give positive review.
Does your remark mean that it would also be better to fix #23612 (edit: sorry I meant #23613) by unsetting PYTHONPATH
instead of making the doctest more admissible?
Replying to @koffie:
Does your remark mean that it would also be better to fix #23612 by unsetting
PYTHONPATH
instead of making the doctest more admissible?
Are you sure you mean #23612? It seems unrelated to PYTHONPATH
or doctests.
Sorry, I meant #23613.
Ok looks good to me.
Reviewer name
Reviewer: Maarten Derickx
Changed branch from u/jdemeyer/run_doctests_with_omp_num_threads_2 to 9f9f7b7
I wonder if this and/or #23748 will fix the doc build problems I was having on Windows for the past several weeks (which caused me to have to take down the Window patchbot >_<). Fingers crossed...
Oh wait, this was just for the doctests, I misread. Maybe not then...
The
normaliz
package uses OMP for threading, which can create many threads. In doctests, this is bad for two reasons:Doctests should not use an unexpectedly large number of system resources.
When there are too many threads, the virtual memory limit from #23748 will be hit.
There is a solution: set the environment variable
OMP_NUM_THREADS=2
while doctesting.CC: @koffie
Component: packages: optional
Author: Jeroen Demeyer
Branch:
9f9f7b7
Reviewer: Maarten Derickx
Issue created by migration from https://trac.sagemath.org/ticket/23892