Closed oscarbenjamin closed 5 years ago
I'm able to reproduce this on windows. While watching this, it appears to take 3GB
of ram before segfaulting:
$ pytest sympy/integrals/rubi/rubi_tests/tests/test_trinomials.py
============================= test session starts ==============================
platform linux -- Python 3.6.6, pytest-4.0.0, py-1.7.0, pluggy-0.8.0
architecture: 64-bit
cache: yes
ground types: python
rootdir: /tmp/sympy, inifile:
collecting ... Segmentation fault (core dumped)
analyzing the core dump (without a debug binary, spinning up docker right now to use a debug build) it appears to be a stack overflow.
... (170k identical frames removed)
#173931 0x0000000000506ac3 in ?? ()
#173932 0x0000000000506ac3 in ?? ()
#173933 0x0000000000506ac3 in ?? ()
#173934 0x0000000000506ac3 in ?? ()
#173935 0x0000000000506ac3 in ?? ()
#173936 0x0000000000506ac3 in ?? ()
#173937 0x0000000000506ac3 in ?? ()
#173938 0x0000000000506ac3 in ?? ()
#173939 0x0000000000506ac3 in ?? ()
#173940 0x0000000000506ac3 in ?? ()
#173941 0x0000000000506ac3 in ?? ()
#173942 0x0000000000506ac3 in ?? ()
#173943 0x0000000000506ac3 in ?? ()
#173944 0x0000000000506ac3 in ?? ()
#173945 0x0000000000506ac3 in ?? ()
#173946 0x0000000000506ac3 in ?? ()
#173947 0x0000000000506ac3 in ?? ()
#173948 0x0000000000506ac3 in ?? ()
#173949 0x0000000000508c4f in ?? ()
#173950 0x000000000050d16d in ?? ()
#173951 0x000000000050ffaf in ?? ()
#173952 0x00000000005fac2b in PyAST_CompileObject ()
#173953 0x0000000000639ce5 in ?? ()
#173954 0x000000000051190a in ?? ()
#173955 0x00000000004f6070 in _PyEval_EvalFrameDefault ()
#173956 0x0000000000510c78 in ?? ()
#173957 0x00000000005119bd in ?? ()
#173958 0x00000000004f5277 in _PyEval_EvalFrameDefault ()
... uninteresting frames removed
This doesn't reproduce with --assert=plain
so it is definitely ticked / compounded by the assertion rewriting:
$ python -m pytest --assert=plain sympy/integrals/rubi/rubi_tests/tests/test_trinomials.py
============================= test session starts ==============================
platform linux -- Python 3.6.6, pytest-4.0.0, py-1.7.0, pluggy-0.8.0
architecture: 64-bit
cache: yes
ground types: python
rootdir: /tmp/sympy, inifile:
collected 5 items
sympy/integrals/rubi/rubi_tests/tests/test_trinomials.py sssss [100%]
========================== 5 skipped in 0.70 seconds ===========================
with python3-dbg
I get a backtrace that looks like:
(gdb) bt
#0 0x000000000053a4d1 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104ea40,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4787
#1 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e8b0,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#2 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e900,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#3 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e950,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#4 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e7c0,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#5 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e810,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#6 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e860,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#7 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e6d0,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
#8 0x000000000053a4f7 in dfs (c=c@entry=0x7fffffff7ec0, b=0x7fff1104e720,
a=a@entry=0x7fffffff7d70) at ../Python/compile.c:4791
which points at this: https://github.com/python/cpython/blob/4cf1f54eb764f856fda5a394d8598c0c90d69d77/Python/compile.c#L4791
Okay, so what does that mean? Is this a bug in pytest, cpython or both?
If you can see a simple way to reproduce this without using pytest I'll take that as a bug report to cpython.
(Of course the test_trinomials.py
file itself is questionable but that
doesn't let pytest/cpython off the hook for seg-faulting.)
its possibly an interaction issue where the ast we generate mismatches a potentially hidden/changed expectation of cpython - more investigation is needed
The ~2891 line file expands to ~1029102 (yes million) lines when assert-rewritten.
For example, this (relative to the rest of the file) small assert:
assert rubi_test(rubi_integrate(x**m*(b*x**S(2) + c*x**S(4)), x), x, b*x**(m + S(3))/(m + S(3)) + c*x**(m + S(5))/(m + S(5)), expand=True, _diff=True, _numerical=True)
is rewritten to this (ignore the invalid variable names, since pytest writes directly to the ast the names don't actually matter -- the @py_
prefix is used to avoid clashing with user space variables):
@py_assert4 = (x ** m)
@py_assert8 = 2
@py_assert10 = S(@py_assert8)
@py_assert12 = (x ** @py_assert10)
@py_assert13 = (b * @py_assert12)
@py_assert17 = 4
@py_assert19 = S(@py_assert17)
@py_assert21 = (x ** @py_assert19)
@py_assert22 = (c * @py_assert21)
@py_assert23 = (@py_assert13 + @py_assert22)
@py_assert24 = (@py_assert4 * @py_assert23)
@py_assert26 = rubi_integrate(@py_assert24, x)
@py_assert33 = 3
@py_assert35 = S(@py_assert33)
@py_assert37 = (m + @py_assert35)
@py_assert38 = (x ** @py_assert37)
@py_assert39 = (b * @py_assert38)
@py_assert42 = 3
@py_assert44 = S(@py_assert42)
@py_assert46 = (m + @py_assert44)
@py_assert47 = (@py_assert39 / @py_assert46)
@py_assert52 = 5
@py_assert54 = S(@py_assert52)
@py_assert56 = (m + @py_assert54)
@py_assert57 = (x ** @py_assert56)
@py_assert58 = (c * @py_assert57)
@py_assert61 = 5
@py_assert63 = S(@py_assert61)
@py_assert65 = (m + @py_assert63)
@py_assert66 = (@py_assert58 / @py_assert65)
@py_assert67 = (@py_assert47 + @py_assert66)
@py_assert68 = True
@py_assert70 = True
@py_assert72 = True
@py_assert74 = rubi_test(@py_assert26, x, @py_assert67, expand=@py_assert68, _diff=@py_assert70, _numerical=@py_assert72)
if (not @py_assert74):
@py_format76 = (('' + 'assert %(py75)s\n{%(py75)s = %(py0)s(%(py27)s\n{%(py27)s = %(py1)s(((%(py2)s ** %(py3)s) * ((%(py5)s * (%(py6)s ** %(py11)s\n{%(py11)s = %(py7)s(%(py9)s)\n})) + (%(py14)s * (%(py15)s ** %(py20)s\n{%(py20)s = %(py16)s(%(py18)s)\n})))), %(py25)s)\n}, %(py28)s, (((%(py29)s * (%(py30)s ** (%(py31)s + %(py36)s\n{%(py36)s = %(py32)s(%(py34)s)\n}))) / (%(py40)s + %(py45)s\n{%(py45)s = %(py41)s(%(py43)s)\n})) + ((%(py48)s * (%(py49)s ** (%(py50)s + %(py55)s\n{%(py55)s = %(py51)s(%(py53)s)\n}))) / (%(py59)s + %(py64)s\n{%(py64)s = %(py60)s(%(py62)s)\n}))), expand=%(py69)s, _diff=%(py71)s, _numerical=%(py73)s)\n}') % {
'py0': (@pytest_ar._saferepr(rubi_test) if (('rubi_test' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(rubi_test)) else 'rubi_test'),
'py1': (@pytest_ar._saferepr(rubi_integrate) if (('rubi_integrate' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(rubi_integrate)) else 'rubi_integrate'),
'py2': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py3': (@pytest_ar._saferepr(m) if (('m' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(m)) else 'm'),
'py5': (@pytest_ar._saferepr(b) if (('b' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(b)) else 'b'),
'py6': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py7': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py9': @pytest_ar._saferepr(@py_assert8),
'py11': @pytest_ar._saferepr(@py_assert10),
'py14': (@pytest_ar._saferepr(c) if (('c' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(c)) else 'c'),
'py15': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py16': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py18': @pytest_ar._saferepr(@py_assert17),
'py20': @pytest_ar._saferepr(@py_assert19),
'py25': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py27': @pytest_ar._saferepr(@py_assert26),
'py28': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py29': (@pytest_ar._saferepr(b) if (('b' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(b)) else 'b'),
'py30': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py31': (@pytest_ar._saferepr(m) if (('m' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(m)) else 'm'),
'py32': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py34': @pytest_ar._saferepr(@py_assert33),
'py36': @pytest_ar._saferepr(@py_assert35),
'py40': (@pytest_ar._saferepr(m) if (('m' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(m)) else 'm'),
'py41': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py43': @pytest_ar._saferepr(@py_assert42),
'py45': @pytest_ar._saferepr(@py_assert44),
'py48': (@pytest_ar._saferepr(c) if (('c' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(c)) else 'c'),
'py49': (@pytest_ar._saferepr(x) if (('x' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(x)) else 'x'),
'py50': (@pytest_ar._saferepr(m) if (('m' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(m)) else 'm'),
'py51': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py53': @pytest_ar._saferepr(@py_assert52),
'py55': @pytest_ar._saferepr(@py_assert54),
'py59': (@pytest_ar._saferepr(m) if (('m' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(m)) else 'm'),
'py60': (@pytest_ar._saferepr(S) if (('S' in @py_builtins.locals()) or @pytest_ar._should_repr_global_name(S)) else 'S'),
'py62': @pytest_ar._saferepr(@py_assert61),
'py64': @pytest_ar._saferepr(@py_assert63),
'py69': @pytest_ar._saferepr(@py_assert68),
'py71': @pytest_ar._saferepr(@py_assert70),
'py73': @pytest_ar._saferepr(@py_assert72),
'py75': @pytest_ar._saferepr(@py_assert74),
})
raise AssertionError(@pytest_ar._format_explanation(@py_format76))
Some of the assertions cause pytest to produce 3500+ variables for the assertion message:
$ grep '^ *@py_assert35[0-9][0-9]' f.py | tail -1
@py_assert3562 = rubi_test(@py_assert42, x, @py_assert3555, expand=@py_assert3556, _diff=@py_assert3558, _numerical=@py_assert3560)
I suspect that's just too much code for python to handle (especially in so few functions)
Issue 31113 for cpython looks similar to this. Apparently a fix went into Python 3.7.1. I've just tested it with Python 3.7.1 and it works:
$ pytest sympy/integrals/rubi/rubi_tests/tests/test_trinomials.py
==================================================================== test session starts =====================================================================
platform darwin -- Python 3.7.1, pytest-4.0.0, py-1.7.0, pluggy-0.8.0
architecture: 64-bit
cache: yes
ground types: python
rootdir: /Users/enojb/current/sympy/sympy, inifile:
collected 5 items
sympy/integrals/rubi/rubi_tests/tests/test_trinomials.py sssss [100%]
================================================================= 5 skipped in 99.38 seconds =================================================================
So it looks as if CPython has improved its ability to handle large files. It's still possible that pytest shouldn't be generating them though.
code generation is how pytest makes pretty assertions, I don't think there's a way out of that. Since this is fixed in cpython, I'm going to close this.
Thanks for the issue!
Well... when I say that it works, it does take a long time and consume a lot of memory just to collect these tests (and then skip them!).
I would expect other situations where this causes problems if pytest has no limits on the code it generates.
your example here is pretty exceptional: a test containing 3k lines of 20+ operation assertions 😆 -- I wouldn't expect anyone to hit this normally
we ought to consider thinking about proposing opt-out for files with high cost patterns like the one @oscarbenjamin has in the testsuite
a number of assertions per file treshold could help
@oscarbenjamin could you take a look at which parts of your file actually cause the issues (aka is it one special test with exorbitant complexity result for the assert statement, or is it she sheer number?
based on this assessment we can think of something to make sense of this
@nicoddemus @asottile i do wonder if it would be a sensible approach to switch from plain assertions to a metatool where the unaltered code of the assertion only is executed on top of a special namespace
this would reduce the complexity of the ast significantly, moving the cost to different places instead
we ought to consider thinking about proposing opt-out for files with high cost patterns like the one @oscarbenjamin has in the testsuite
a number of assertions per file treshold could help
Is the rewriting done per function or per module? Not rewriting skipped tests would solve the problem for this particular module.
@oscarbenjamin could you take a look at which parts of your file actually cause the issues (aka is it one special test with exorbitant complexity result for the assert statement, or is it she sheer number?
based on this assessment we can think of something to make sense of this
Yeah, I can take a look. How can I see the output that pytest would generate from assert rewriting?
It would be nice if there was a command line flag for that:
$ pytest sympy/solvers/tests/test_ode.py -k test_classify_ode --show-assert-rewrite
@nicoddemus @asottile i do wonder if it would be a sensible approach to switch from plain assertions to a metatool where the unaltered code of the assertion only is executed on top of a special namespace
this would reduce the complexity of the ast significantly, moving the cost to different places instead
It's probably worth considering the cost of assertion rewriting in less pathological cases as well as the example in this issue. A more typical test run for me would be:
$ git clone https://github.com/sympy/sympy.git
$ cd sympy/
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py
With that I get:
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py --assert=plain # 329 seconds
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py # 334 seconds
The additional time is the same when running tests in parallel with pytest-xdist
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py -n 3 --assert=plain # 193 seconds
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py -n 3 # 206 seconds
So it looks as if assert-rewriting test_ode.py
takes 15 seconds and can't be parallelised. Hopefully I will make the tested code more efficient in which case that extra cost would become relatively more significant. I need to improve the test coverage first though :).
It's also worth noting that if assert-rewriting slows every tested module then when running a large test suite to see a (hopefully!) small number of fails it would be quicker to run without assert rewriting and then to rerun only the failed tests with rewriting at the end.
xdist likely compounds this as it discovers multiple times in each worker -- given your linear discovery overhead the rewrite seems to (only) take ~5 seconds. looking at that file it's not surprising why -- it has similar complex patterns to the problem-file from before (lots of assertion expressions with high numbers of locals and operations in the assertion)
assertion rewriting is done per module -- this is the most basic unit of the python import system (and the cacheable unit via pyc files)
I displayed the code by applying a patch similar to this one
I'm not convinced normal test suites see this extreme level of overhead.
On pytest's testsuite the total overhead is ~3 seconds (I took best-of-5 for both measurements below):
$ git clean -fxfd testing >& /dev/null && time pytest testing/ --collect-only >& /dev/null
real 0m4.965s
user 0m4.654s
sys 0m0.084s
$ git clean -fxfd testing >& /dev/null && time pytest testing/ --collect-only --assert=plain >& /dev/null
real 0m2.122s
user 0m2.020s
sys 0m0.087s
pytest has ~4000 assertions
On pre-commit's testsuite (the largest I "own": ~550 asserts) the overhead is .5s (1.5s vs 1.0s).
collection coordination for xdist is a desired feature simply because right now the import system costs are problematic for some types of project (i believe cryptography is affected as well CC @alex )
I reran my timings for
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py --assert=plain
$ pytest -m 'not slow' sympy/solvers/tests/test_ode.py
and after 5 runs of each I don't see a significant difference. Inter-run variability is bigger than the difference between using and not using --assert=plain
. So I think the timings I referred to above are a red herring...
I will still take a look at the issue with test_trinomials.py
in more detail later. I think that what is happening (apart from the file being large) is that particularly complex assert expressions produce large rewritten output but the extra output is probably not useful past a certain point. I'll work through it a bit and open a separate issue since this one is resolved.
@oscarbenjamin note that I clean pyc
files, which is why you're probably not seeing a difference now. I imagine they'll come back if you clean those out.
If I switch between using / not using --assert=plain
do I need to delete the .pyc files for that setting to take effect?
The commandline parameter will take immediately, but you'll only see the rewrite overhead once as it gets cached into pyc files.
@RonnyPfannschmidt as of the last time we checked (though it's been a while; cc: @reaperhulk), that's right -- despite having 100k tests, the cryptography test suite doesn't get a benefit from xdist because of this.
Our issues with pytest-xdist are more related to memory than anything else. Test collection (on an i7-8850H) for cryptography takes ~20s and without assert rewrite it's ~19s, but our test suite takes 5-10 minutes so parallelization would be a win even with the unfortunate overhead of duplicate collection. Unfortunately, Python uses nearly 700MB of RAM per process just collecting the tests (and at test completion each process is ~1GB when we do n=4). Our CI system can't afford that kind of memory consumption.
Do you get the same memory overhead with --assert=plain
?
Yes, asset rewriting has negligible memory impact (at least in our specific case).
thanks for the inputs, we will have to open 2-3 targeted followup issues
I'm trying to use
pytest
withsympy
but getting a segfault. Since bothpytest
andsympy
are pure Python I guess this is a bug in CPython. I'm finding it difficult to debug throughpytest
to isolate the underlying bug though.I'm getting this segfault with Python 3.6 on OSX. I had something similar on Linux but I've now isolated it more precisely on OSX and haven't tested these exact steps on Linux since.
The environment is
Now clone
sympy
and runpytest
ontest_trinomials.py
. This file is 1.4MB and very repetitive. You can see it here. It takes some time before you see the segfault.