pytest-dev / pytest

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
https://pytest.org
MIT License
12.11k stars 2.68k forks source link

`assert str1 == str2` takes forever with long strings that differ by a short prefix #8998

Open sevaseva opened 3 years ago

sevaseva commented 3 years ago

Environment and versions:

platform linux -- Python 3.9.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /home/edited, configfile: pytest.ini
plugins: xdist-2.3.0, timeout-1.4.2, forked-1.3.0
timeout: 10.0s
timeout method: signal
timeout func_only: False

test-and-output.zip

To reproduce: download test.py and run pytest test.py --timeout 10 -vv

EXPECTED: both test methods (test1 and test2) fail quickly with assert False OBSERVED: test1 fails quickly and test2 timeouts after 10 seconds (or runs forever if no --timeout flag passed)

The difference between test1 and test2 is simple

The "long string" part of the literal strings is the same four times: twice in test1 and twice in test2. The two strings that are compared with == both times differ by one leading space character only. The actual strings in test.py are a bit longer (but not crazy long, some 159000 ANSI characters/bytes), but they differ just by one leading space still.

Full output attached as output.txt, that includes the interesting stack trace which hints at where the execution is happening at the moment when the execution is cancelled by the timeout. I will paste that stack trace here for convenience:

../../.local/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:485: in _call_reprcompare
    custom = util._reprcompare(ops[i], each_obj[i], each_obj[i + 1])
../../.local/lib/python3.9/site-packages/_pytest/assertion/__init__.py:141: in callbinrepr
    hook_result = ihook.pytest_assertrepr_compare(
../../.local/lib/python3.9/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
../../.local/lib/python3.9/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
../../.local/lib/python3.9/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
../../.local/lib/python3.9/site-packages/_pytest/assertion/__init__.py:179: in pytest_assertrepr_compare
    return util.assertrepr_compare(config=config, op=op, left=left, right=right)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:153: in assertrepr_compare
    explanation = _compare_eq_any(left, right, verbose)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:176: in _compare_eq_any
    explanation = _diff_text(left, right, verbose)
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:241: in _diff_text
    explanation += [
../../.local/lib/python3.9/site-packages/_pytest/assertion/util.py:241: in <listcomp>
    explanation += [
/usr/lib/python3.9/difflib.py:912: in compare
    yield from g
/usr/lib/python3.9/difflib.py:979: in _fancy_replace
    cruncher.ratio() > best_ratio:
/usr/lib/python3.9/difflib.py:651: in ratio
    matches = sum(triple[-1] for triple in self.get_matching_blocks())
/usr/lib/python3.9/difflib.py:486: in get_matching_blocks
    i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)

Thanks!

hugovk commented 3 years ago

For the first one, Python has already evaluated ' long string' == 'long string' as False, and then pytest just does assert False.

For the second one, it looks like pytest evaluates assert ' long string' == 'long string' as False quickly, but now it knows more about the input and tries to display a useful diff about what is different, and its use of difflib.ndiff is slow (in https://github.com/pytest-dev/pytest/blob/main/src/_pytest/assertion/util.py):

def _diff_text(left: str, right: str, verbose: int = 0) -> List[str]:
    """Return the explanation for the diff between text.

    Unless --verbose is used this will skip leading and trailing
    characters which are identical to keep the diff minimal.
    """

    ... [snip] ...

    explanation += [
        line.strip("\n")
        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
    ]
    return explanation

As a demo, if you hack that assignment (say, change it to explanation = "bleep") it runs quickly.

hugovk commented 3 years ago

Actually, it's not difflib that's slow, it's the comprehension to strip newlines. This runs fast:

     explanation += [
-        line.strip("\n")
-        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+        ndiff(right.splitlines(keepends), left.splitlines(keepends))
     ]
     return explanation
nicoddemus commented 3 years ago

Hmm interesting. I would never bet that line.strip("\n") would make that much of a difference. How about this?

    def strip_newline(line):
        if line.endswith("\n"):
            line = line[:-1] 
        return line

    explanation += [
        strip_newline(line)
        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
    ]
    return explanation

Hmm perhaps that is not even necessary if we use keepends=False:

    keepends = False
    explanation += ndiff(right.splitlines(keepends), left.splitlines(keepends))        
    return explanation
hugovk commented 3 years ago

I'd guess the latter is a bit faster as it's doing things in C, but both of those timeout.

nicoddemus commented 3 years ago

Ahh right, I see what happened, when you mentioned that this was fast:

     explanation += [
-        line.strip("\n")
-        for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+        ndiff(right.splitlines(keepends), left.splitlines(keepends))
     ]
     return explanation

It is actually incorrect, it is adding the result of ndiff as an item to the list, instead of producing all lines and extending the list, and ndiff returns a generator so nothing is actually being done, that's why it is fast.

I bet then the problem is that ndiff will take a long time to produce a diff for very large texts (as expected).

hugovk commented 3 years ago

Ah yes, I should have made clear the things I posted weren't meant to be correct, just looking for the slow bits :)

nicoddemus commented 3 years ago

Sure thing!

Does anyone have any ideas on how we can optimize that?

Orenef11 commented 3 years ago

Sure thing!

Does anyone have any ideas on how we can optimize that?

I think I found a solution, it is much faster than the current code. In the current code, the test failed after 2 minutes and 12 seconds and after the fix, it failed after 26 seconds.

from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher

original-soultion

sevaseva commented 3 years ago

Consider a fast heuristic "if at least one of the strings is longer that X: skip the whole diff calculation thing" ?

RonnyPfannschmidt commented 3 years ago

I believe having ndiff skip long lines may be a major win there

The-Compiler commented 3 years ago

Also see #8404/#7206

Orenef11 commented 3 years ago

I believe having ndiff skip long lines may be a major win there

Why not try to improve ndiff method? Can I open a PR that fixes it?

nicoddemus commented 3 years ago

ndiff is from the difflib standard library module, do you mean to reimplement it?

Orenef11 commented 3 years ago

ndiff is from the difflib standard library module, do you mean to reimplement it?

Yes, I know it's a standard library function. I was thinking of trying to improve the same parts that cause the slowness. Also, another solution is to use difflib.SequenceMatcher = CSequenceMatcher, is much faster than SequenceMatcher implementation.

nicoddemus commented 3 years ago

Ahh I see. Cool, please go ahead. 👍

RonnyPfannschmidt commented 3 years ago

as far as i understand the main cost comes from https://github.com/python/cpython/blob/ad0a8a9c629a7a0fa306fbdf019be63c701a8028/Lib/difflib.py#L902

which runs a inner sequence matcher

i suspect a quick hack that opts out of long items may be a quick relief

as ndiff is a thin wrapper, it may be possible/sensible to create a small subclass that hijacks fancy replace

RonnyPfannschmidt commented 3 years ago

@nicoddemus im wondering, if it would be enough to opt into unified diff for any large/binary sequence? - that way a little context is lost, but the largest catch goes away

nicoddemus commented 3 years ago

It is a possibility. However if it is something that is opt-in, the user can easily just move the comparison outside the assert:

assert s1 != s2

To:

equal = s1 == s2
assert equal, "s1 and s2 differ"

Or do you mean it will automatically skip unified diff if both strings exceed some hard-coded size?

RonnyPfannschmidt commented 3 years ago

i meant auto-skip for huge strings/lines

nicoddemus commented 3 years ago

I see.

How do you think we can allow that? Ideally this should be something configured per-test I guess?

Orenef11 commented 3 years ago

@RonnyPfannschmidt @nicoddemus The solution I thought was simple to implement is not that simple.

Why not create a new view for large objects? It can only be enabled if the user has configured a flag for it.

If this is the solution, what will the output look like, and how would the difference look?

nicoddemus commented 3 years ago

Not sure myself, I feel this should be somehow configured per-test, instead of globally through a command-line flag or configuration option.

Orenef11 commented 3 years ago

In my opinion, the best way is to create for ourselves the output we want to return to the user. Thus, we will not be dependent on packages with slow performance.

To implement it it is possible to use the deepdiff package and we can compare very big objects. WDYT?

RonnyPfannschmidt commented 3 years ago

I propose that we first implement a simple heusteric for pathological cases (aka time it and pick cutoffs)

Then we can choose less detailed diff by default for those and recommend people to use specific diff assertions for their use cases

Orenef11 commented 3 years ago

You're right. I will open PR later today, and we can have a discussion there about a basic solution.

iaroslav-naidon commented 2 years ago

A similar problem (it takes forever to get pytest results) happens when there are mismatches in two big enough (len >= 5000) lists of strings or other objects while running pytest either with -v option or when it detects that it is running on CI [1].

nikicat commented 1 year ago

I've stumbled upon this problem in a test that compare PNG image with a fixture. I've solved it by comparing image md5 and ASCII representation instead of raw data. It's even better because from ASCII diff you could actually get an idea what differs. May be it will be useful for someone.

from PIL import Image
import ascii_magick
from hashlib import md5

def test_image():
    image = Image.open(BytesIO(image_data))
    ascii_image = ascii_magic.from_image(image, width_ratio=1, mode=ascii_magic.Modes.ASCII)
    assert dict(md5=md5(image_data).digest(), image=ascii_image) == fixture_image_data