coverage of multiline generator/set/dict comprehensions is wrong when run with pytest's assertion rewriting

nedbat commented 8 years ago

Originally reported by Andy Freeland (Bitbucket: rouge8, GitHub: rouge8)

Code/config to reproduce available as a gist. Fails on Python 2.7 but not Python 3.5.

Essentially, given this test:

def test_foo():
    # covered!
    assert {i for i in range(10)} == {i for i in range(10)}

    # "didn't finish the set comprehension"
    assert {i for i in range(10)} == {
        i for i in range(10)
    }

    # covered!
    assert True

When run under pytest with assertion rewriting (the default), the multiline set comprehension is reported as partially covered, even though the comprehension on oneline is fully covered. I think this is a bug in coverage, not pytest's assertion rewriting, because this code passes:

import ast
from _pytest.assertion.rewrite import rewrite_asserts

oneline = """assert {i for i in range(10)} == {i for i in range(10)}"""
multiline = """assert {i for i in range(10)} == {
    i for i in range(10)
}"""

# Parse the expressions
oneline_tree = ast.parse(oneline)
multiline_tree = ast.parse(multiline)

# Dump the pre-assertion rewrite ASTs
multiline_dump_prerewrite = ast.dump(multiline_tree)
oneline_dump_prerewrite = ast.dump(oneline_tree)

# The ASTs should be the same
assert multiline_dump_prerewrite == oneline_dump_prerewrite

# Rewrite the asserts
rewrite_asserts(oneline_tree)
rewrite_asserts(multiline_tree)

# Dump the rewritten ASTs
oneline_dump_rewrite = ast.dump(oneline_tree)
multiline_dump_rewrite = ast.dump(multiline_tree)

# The ASTs should be the same
assert oneline_dump_rewrite == multiline_dump_rewrite

Bitbucket: https://bitbucket.org/ned/coveragepy/issue/515

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

During coverage report the arcs found by analysing the AST tree of a source file should show one line per multiline statement because they are remapped to the first line of the statement.

However, the first_line function relies on the multiline data member that only remaps positive line numbers.

For example, the set comprehension at line 3 in

#!python

def test_foo():
    assert {i for i in range(10)} == {
        i for i in range(10)
    }

is added as two arcs as (3, -3) and (-3, 3) as shown by

#!bash

$ coverage run --branch --source=test_foo.py -m pytest test_foo.py
$ COVERAGE_ASTDUMP=1 COVERAGE_TRACK_ARCS=1 coverage report
...
Adding arc: (-3, 3): None, "didn't run the set comprehension on line 3"
Adding arc: (3, -3): None, "didn't finish the set comprehension on line 3"

The arcs are then renumbered by _analyze_ast and

#!python

[(-3, 3), (-2, 2), (-1, 1), (-1, 2), (1, -1), (2, -2), (2, -1), (3, -3)]

becomes

#!python

[(-3, 2), (-2, 2), (-1, 1), (-1, 2), (1, -1), (2, -2), (2, -1), (2, -3)]

As if the (-3, 2) and (2, -3) arcs existed, although they do not. When the arcs_missing function compares that to the actual arcs collected by coverage run which are stored in the .coverage file:

#!python

[(-2, 2), (-1, 1), (-1, 2), (1, -1), (2, -2), (2, -1), (2, 2)]

and concludes that the following arcs have not be executed

#!python

[(-3, 2), (2, -3)]

Note: this can be verified by printing the local variables of the arcs_missing function.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

@nedbat I think it happens because multiline re-numbering does not happen for negative line numbers and it could be fixed with

#!diff

--- a/coverage/parser.py    Mon Dec 12 15:07:48 2016 +0100
+++ b/coverage/parser.py    Wed Dec 14 16:31:35 2016 +0100
@@ -183,6 +183,7 @@
                     # so record a multi-line range.
                     for l in range(first_line, elineno+1):
                         self._multiline[l] = first_line
+                        self._multiline[-l] = -first_line                        
                 first_line = None
                 first_on_line = True

Without this patch the (3,-3) and (-3,3) arcs are renumbered (2,-3) and (-3,2) which is incorrect because there is no arc between line 2 and line 3. With this patch the (3,-3) and (-3,3) arcs are renumbered (2,-2) and (-2,2) and the code coverage is correct.

Do you think it could be the root cause of the problem ? If you're not sure I can try to explain why it accounts for all the symptoms.

nedbat commented 7 years ago

I'm not following all the details here, but I love that you are digging into it! When you have something for me to do, leave a clear message telling me what it is!

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

@nicoddemus

#!python

def test_foo():
    assert func_call() == \
        func_call2()

works because it does not contain any branch and coverage.py does not care about the last line. It records the assert as a single line which, if not run, indicates the function never returned (see arc 8, -7 below and notice the absence of arc involving line 9 which is the call to func_call2):

#!python

$ coverage run --branch --source=test_bar.py -m pytest test_bar.py
$ COVERAGE_TRACK_ARCS=1 coverage report
..
Adding arc: (-1, 1): None, None
Adding arc: (1, 4): None, None
Adding arc: (4, 7): None, None
Adding arc: (7, -1): None, "didn't exit the module"
Adding arc: (-1, 2): None, None
Adding arc: (2, -1): None, "didn't return from function 'func_call'"
Adding arc: (-4, 5): None, None
Adding arc: (5, -4): None, "didn't return from function 'func_call2'"
Adding arc: (-7, 8): None, None
Adding arc: (8, -7): None, "didn't return from function 'test_foo'"
..

By contrast the assert from the original bug description contains arcs and coverage.py expects them to be run (see the arcs -3, 3 and 3, -3 which are about the second part of the assert):

#!python

$ coverage run --branch --source=test_foo.py -m pytest test_foo.py
$ COVERAGE_TRACK_ARCS=1 coverage report
..
Adding arc: (-1, 1): None, None
Adding arc: (1, -1): None, "didn't exit the module"
Adding arc: (-1, 2): None, None
Adding arc: (2, -1): None, "didn't return from function 'test_foo'"
Adding arc: (-2, 2): None, "didn't run the set comprehension on line 2"
Adding arc: (2, -2): None, "didn't finish the set comprehension on line 2"
Adding arc: (-3, 3): None, "didn't run the set comprehension on line 3"
Adding arc: (3, -3): None, "didn't finish the set comprehension on line 3"

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

pytest indeed assigns the line number of the rewritten assertion to the first line number of the assertion. However, since coverage.py recognize the assertion as a multiline expression, it should not make a difference.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

The HTML message is associated with the line when the AST is analyzed (and as shown above, pytest does not modify the AST when rewriting). When the HTML report is created, this message is displayed because the line is reported to not be covered. Therefore it does not convey more information than the summary report stating the line is not covered.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

I should have paid more attention to the original description that mentions the HTML message "line N didn't finish the set comprehension on line N" and try to figure out why it happens

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

But it does. Something else is going on :-)

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

I think it may simply be because the coverage.py parser does not recognize

#!python

assert {i for i in range(10)} == {
        i for i in range(10)
    }

as a multiline expression as it should.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

Hum, you're right.

#!python

def func_call():
    pass

def func_call2():
    pass

def test_foo():
    assert func_call() == \
        func_call2()

does not cause any problem although line 9 does not show at all in the .coverage file.

#!bash

$ coverage run --branch --source=test_bar.py -m pytest test_bar.py
$ cat .coverage
...[[-4,5],[8,-7],[-1,1],[2,-1],[-1,2],[4,7],[1,4],[7,-1],[-7,8],[5,-4]]...
$ coverage report
Name          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------
test_bar.py       6      0      0      0   100%
$ coverage run --branch --source=test_bar.py -m pytest --assert=plain test_bar.py
$ cat .coverage
...[[-4,5],[-1,1],[2,-1],[-1,2],[4,7],[1,4],[9,-7],[8,9],[7,-1],[-7,8],[5,-4]]...
$ coverage report
Name          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------
test_bar.py       6      0      0      0   100%

nedbat commented 7 years ago

Original comment by Bruno Oliveira (Bitbucket: nicoddemus, GitHub: nicoddemus)

This could be explained because pytest rewrites the assert as a single line instead of multiline. The content of the .coverage file is created from line events sent via the sys.setttrace to the coverage.py trace function and reflect the code re-written by pytest. However, coverage report compiles the unmodified source and finds that some lines are not covered.

This makes sense, but wouldn't this type of assert also cause the same problem?

def func_call(): return 1
def func_call2(): return 1

def test_foo():
    assert func_call() == \
        func_call2()

?

Pytest will break each operand into its own statement in that case as well:

=========================== Rewritten AST as Python ===========================
import builtins as @py_builtins
import _pytest.assertion.rewrite as @pytest_ar

def func_call():
    return 1

def func_call2():
    return 1

def test_foo():
    @py_assert1 = func_call()
    @py_assert5 = func_call2()
    @py_assert3 = @py_assert1 == @py_assert5
    if not @py_assert3:
        @py_format7 = @pytest_ar._call_reprcompare(('==',), (@py_assert3,), ('''%(py2)s
{%(py2)s = %(py0)s()
} == %(py6)s
{%(py6)s = %(py4)s()
}''',), (@py_assert1, @py_assert5)) % {'py6': @pytest_ar._saferepr(@py_assert5), 'py4': @pytest_ar._saferepr(func_call2) if 'func_call2' in @py_builtins.locals() or @pytest_ar._should_repr_global_name(func_call2) else 'func_call2', 'py2': @pytest_ar._saferepr(@py_assert1), 'py0': @pytest_ar._saferepr(func_call) if 'func_call' in @py_builtins.locals() or @pytest_ar._should_repr_global_name(func_call) else 'func_call'}
        @py_format9 = ('' + 'assert %(py8)s') % {'py8': @py_format7}
        raise AssertionError(@pytest_ar._format_explanation(@py_format9))
    @py_assert1 = @py_assert3 = @py_assert5 = None

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

The test_foo.py file is below and virtualenv is set with python-2.7.12 and pytest-2.8.7 and coverage.py at 24aff3d7bfd5 (bb)

#!python

def test_foo():
    assert {i for i in range(10)} == {
        i for i in range(10)
    }

#!bash

$ coverage run --branch --source=test_foo.py -m pytest test_foo.py
$ cat .coverage ; echo
...[[2,-1],[-1,1],[-1,2],[2,-2],[-2,2],[2,2],[1,-1]]..
$ coverage report
Name          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------
test_foo.py       2      0      3      1    80%

This could be explained because pytest rewrites the assert as a single line instead of multiline. The content of the .coverage file is created from line events sent via sys.settrace to the coverage.py trace function and reflect the code re-written by pytest. However, coverage report compiles the unmodified source and finds that some lines are not covered.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

I did not know about pytest-ast-back-to-python, cool :-) My theory is that while the rewritten assert is correct it was transformed into a onliner instead of multiline. Could it be the root cause of this confusion ?

nedbat commented 7 years ago

Original comment by Bruno Oliveira (Bitbucket: nicoddemus, GitHub: nicoddemus)

Hi @dachary,

Here's the rewritten AST of your examplem, courtesy of pytest-ast-back-to-python:

============================= test session starts =============================
platform win32 -- Python 3.5.0, pytest-3.0.5, py-1.4.31, pluggy-0.4.0
rootdir: c:\pytest, inifile: tox.ini
plugins: ast-back-to-python-0.1.0
collected 1 items

c:\pytest\.tmp\cov_test.py .
=========================== Rewritten AST as Python ===========================
import builtins as @py_builtins
import _pytest.assertion.rewrite as @pytest_ar

def test_foo():
    @py_assert0 = {i for i in range(10)}
    @py_assert3 = {i for i in range(10)}
    @py_assert2 = @py_assert0 == @py_assert3
    if not @py_assert2:
        @py_format5 = @pytest_ar._call_reprcompare(('==',), (@py_assert2,), ('%(py1)s == %(py4)s',), (@py_assert0, @py_assert3)) % {'py1': @pytest_ar._saferepr(@py_assert0), 'py4': @pytest_ar._saferepr(@py_assert3)}
        @py_format7 = ('' + 'assert %(py6)s') % {'py6': @py_format5}
        raise AssertionError(@pytest_ar._format_explanation(@py_format7))
    @py_assert0 = @py_assert2 = @py_assert3 = None

========================== 1 passed in 0.01 seconds ===========================

(I have the same AST in Python 2.7 and Python 3.4)

(Also note that to reproduce this, you either have to use pytest~=2.8 or after use the branch from this PR).

It seems to me the rewritten expression is correct, not sure what could be the problem here.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

For the record, updated the pytest issue with a theory.

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

Could it be that starting from python-3.5 sys.settrace claims that all event == line that belong to a multiline assert are from the first line of the assert ? If pytest rewrites assertions as if they were on a single line even if they are multiline, it would explain why it only breaks python versions before 3.5. It's just a theory at this point ;-)

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

Here is a minimal reproducer with test_foo.py as

#!python

def test_foo():
    assert {i for i in range(10)} == {
        i for i in range(10)
    }

#!bash

coverage run --branch --source=test_foo.py -m pytest test_foo.py

using python-3.5 or python-2.7.12 both create a .coverage file containing

#!json

[[2,-1],[-1,1],[-1,2],[2,-2],[-2,2],[2,2],[1,-1]]

With --assert=plain the content of the .coverage file is different when using python-2.7.12 but remains the same with python-3.5

#!bash

coverage run --branch --source=test_foo.py -m pytest--assert=plain test_foo.py

#!json

[[-3,3],[-1,1],[-1,2],[3,3],[3,-3],[2,-2],[3,-1],[2,3],[-2,2],[2,2],[1,-1]]

nedbat commented 7 years ago

Original comment by Loic Dachary (Bitbucket: dachary, GitHub: dachary)

Added a bug report to pytest in case the problem can be fixed on this end.

nedbat commented 8 years ago

Original comment by Scott Colby (Bitbucket: scolby33, GitHub: scolby33)

I'd like to +1 this issue with what I think is a similar situation (Python 2.7.12 only, I haven't yet tried to reproduce with 3.5):

#!python

# in my module
data = {'a': {'a': 1, 'b': 2}, 'b': {'a': 1, 'b': None}}
if any([score is None for row in data.values() for score in row.values()]):
    raise ValueError

# in a test function
with pytest.raises(ValueError):
    function_that_was_in()

this reports a missed branch and "line 44 didn't run the list comprehension on line 44" or similar.

Replacing with a generator epxression:

#!python

if any(score is None for row in data.values() for score in row.values()):
    raise ValueError

reports full coverage.

Messing with --assert=plain didn't change anything, here. In this case, the comprehension/generator is in the code under test, not the test suite itself.

nedbat commented 8 years ago

Original comment by Andy Freeland (Bitbucket: rouge8, GitHub: rouge8)

One thing that is strange to me is that branch count is not zero

The branches here are the possibility of exit before the generator is evaluated. In Python 2, set/dict comprehensions are implemented as generators I believe, so won't necessarily be evaluated, vs. list comprehensions which always are. If you look at the HTML coverage, the partial lines are annotated "line N didn't finish the set comprehension on line N"

nedbat commented 8 years ago

Original comment by Artem Dayneko (Bitbucket: jamert, GitHub: jamert)

Or should it report 3 branches - one for each assert?

nedbat commented 8 years ago

Original comment by Artem Dayneko (Bitbucket: jamert, GitHub: jamert)

Issue is reproducible under 2.7.12, 3.3.6, 3.4.5 (and all supported pypy's - 2.4, 2.6, 3-2.4). Under 3.5.2 missing coverage is not observed.

With assert rewriting results looks like:

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
test_foo.py       4      0      5      1    89%   6->exit

Without assert rewriting (--assert=plain)it looks like

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
test_foo.py       4      0      5      0   100%

One thing that is strange to me is that branch count is not zero. test_foo.py does not have branching and if we change set comprehension to list comprehension then report correctly shows 0 branches in both cases:

Name          Stmts   Miss Branch BrPart  Cover   Missing
---------------------------------------------------------
test_foo.py       4      0      0      0   100%

FYI: under python 3.5.2 where issue is not observed, branch count is 4, not 5 (all covered)

nedbat commented 8 years ago

Original comment by Andy Freeland (Bitbucket: rouge8, GitHub: rouge8)

Included the output in the gist, but that test is fully covered if running pytest without assertion rewriting, tox -- --assert=plain. The snippet above though suggests that the two expressions are equivalent after the rewrite, so not sure what's going on.

rouge8 commented 5 years ago

Interestingly, list comprehensions behave in the opposite way:

def test_set_comprehension():
    # covered!
    assert {i for i in range(10)} == {i for i in range(10)}

    # "didn't finish the set comprehension"
    assert {i for i in range(10)} == {
        i for i in range(10)
    }

    # covered!
    assert True

def test_list_comprehension():
    assert [i for i in range(10)] == [i for i in range(10)]

    # "didn't finish the list comprehension"
    assert [i for i in range(10)] == [
        i for i in range(10)
    ]

    # covered!
    assert True

In test_set_comprehension, Python 2.7 will report the branch "didn't finish the set comprehension" was uncovered and Python 3.6/3.7 will report it is covered, but in test_list_comprehension, Python 2.7 sees the branch as covered and 3.6/3.7 see it as uncovered. 😩

rouge8 commented 5 years ago

This now shows up in even more cases with all() in the latest pytest, likely due to https://github.com/pytest-dev/pytest/pull/5360

nicoddemus commented 5 years ago

We are about to revert the unroll functionality for all() as we have found a bunch of corner cases. We should release 4.6.2 later today.

nedbat / coveragepy

coverage of multiline generator/set/dict comprehensions is wrong when run with pytest's assertion rewriting #515