Open sevaseva opened 3 years ago
For the first one, Python has already evaluated ' long string' == 'long string'
as False
, and then pytest just does assert False
.
For the second one, it looks like pytest evaluates assert ' long string' == 'long string'
as False
quickly, but now it knows more about the input and tries to display a useful diff about what is different, and its use of difflib.ndiff
is slow (in https://github.com/pytest-dev/pytest/blob/main/src/_pytest/assertion/util.py):
def _diff_text(left: str, right: str, verbose: int = 0) -> List[str]:
"""Return the explanation for the diff between text.
Unless --verbose is used this will skip leading and trailing
characters which are identical to keep the diff minimal.
"""
... [snip] ...
explanation += [
line.strip("\n")
for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
]
return explanation
As a demo, if you hack that assignment (say, change it to explanation = "bleep"
) it runs quickly.
Actually, it's not difflib
that's slow, it's the comprehension to strip newlines. This runs fast:
explanation += [
- line.strip("\n")
- for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+ ndiff(right.splitlines(keepends), left.splitlines(keepends))
]
return explanation
Hmm interesting. I would never bet that line.strip("\n")
would make that much of a difference. How about this?
def strip_newline(line):
if line.endswith("\n"):
line = line[:-1]
return line
explanation += [
strip_newline(line)
for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
]
return explanation
Hmm perhaps that is not even necessary if we use keepends=False
:
keepends = False
explanation += ndiff(right.splitlines(keepends), left.splitlines(keepends))
return explanation
I'd guess the latter is a bit faster as it's doing things in C, but both of those timeout.
Ahh right, I see what happened, when you mentioned that this was fast:
explanation += [
- line.strip("\n")
- for line in ndiff(right.splitlines(keepends), left.splitlines(keepends))
+ ndiff(right.splitlines(keepends), left.splitlines(keepends))
]
return explanation
It is actually incorrect, it is adding the result of ndiff
as an item to the list, instead of producing all lines and extending the list, and ndiff
returns a generator so nothing is actually being done, that's why it is fast.
I bet then the problem is that ndiff
will take a long time to produce a diff for very large texts (as expected).
Ah yes, I should have made clear the things I posted weren't meant to be correct, just looking for the slow bits :)
Sure thing!
Does anyone have any ideas on how we can optimize that?
Sure thing!
Does anyone have any ideas on how we can optimize that?
I think I found a solution, it is much faster than the current code. In the current code, the test failed after 2 minutes and 12 seconds and after the fix, it failed after 26 seconds.
from cdifflib import CSequenceMatcher
import difflib
difflib.SequenceMatcher = CSequenceMatcher
Consider a fast heuristic "if at least one of the strings is longer that X: skip the whole diff calculation thing" ?
I believe having ndiff skip long lines may be a major win there
Also see #8404/#7206
I believe having ndiff skip long lines may be a major win there
Why not try to improve ndiff
method?
Can I open a PR that fixes it?
ndiff
is from the difflib
standard library module, do you mean to reimplement it?
ndiff
is from thedifflib
standard library module, do you mean to reimplement it?
Yes, I know it's a standard library function.
I was thinking of trying to improve the same parts that cause the slowness. Also, another solution is to use difflib.SequenceMatcher = CSequenceMatcher
, is much faster than SequenceMatcher
implementation.
Ahh I see. Cool, please go ahead. 👍
as far as i understand the main cost comes from https://github.com/python/cpython/blob/ad0a8a9c629a7a0fa306fbdf019be63c701a8028/Lib/difflib.py#L902
which runs a inner sequence matcher
i suspect a quick hack that opts out of long items may be a quick relief
as ndiff is a thin wrapper, it may be possible/sensible to create a small subclass that hijacks fancy replace
@nicoddemus im wondering, if it would be enough to opt into unified diff for any large/binary sequence? - that way a little context is lost, but the largest catch goes away
It is a possibility. However if it is something that is opt-in, the user can easily just move the comparison outside the assert
:
assert s1 != s2
To:
equal = s1 == s2
assert equal, "s1 and s2 differ"
Or do you mean it will automatically skip unified diff if both strings exceed some hard-coded size?
i meant auto-skip for huge strings/lines
I see.
How do you think we can allow that? Ideally this should be something configured per-test I guess?
@RonnyPfannschmidt @nicoddemus The solution I thought was simple to implement is not that simple.
Why not create a new view for large objects? It can only be enabled if the user has configured a flag for it.
If this is the solution, what will the output look like, and how would the difference look?
Not sure myself, I feel this should be somehow configured per-test, instead of globally through a command-line flag or configuration option.
In my opinion, the best way is to create for ourselves the output we want to return to the user. Thus, we will not be dependent on packages with slow performance.
To implement it it is possible to use the deepdiff package and we can compare very big objects. WDYT?
I propose that we first implement a simple heusteric for pathological cases (aka time it and pick cutoffs)
Then we can choose less detailed diff by default for those and recommend people to use specific diff assertions for their use cases
You're right. I will open PR later today, and we can have a discussion there about a basic solution.
A similar problem (it takes forever to get pytest results) happens when there are mismatches in two big enough (len >= 5000
) lists of strings or other objects while running pytest
either with -v
option or when it detects that it is running on CI [1].
I've stumbled upon this problem in a test that compare PNG image with a fixture. I've solved it by comparing image md5 and ASCII representation instead of raw data. It's even better because from ASCII diff you could actually get an idea what differs. May be it will be useful for someone.
from PIL import Image
import ascii_magick
from hashlib import md5
def test_image():
image = Image.open(BytesIO(image_data))
ascii_image = ascii_magic.from_image(image, width_ratio=1, mode=ascii_magic.Modes.ASCII)
assert dict(md5=md5(image_data).digest(), image=ascii_image) == fixture_image_data
Environment and versions:
test-and-output.zip
To reproduce: download
test.py
and runpytest test.py --timeout 10 -vv
EXPECTED: both test methods (test1 and test2) fail quickly with
assert False
OBSERVED: test1 fails quickly and test2 timeouts after 10 seconds (or runs forever if no --timeout flag passed)The difference between test1 and test2 is simple
x = ' long string' == 'long string'; assert x
and behaves as expected, butassert ' long string' == 'long string'
and pytest substitutes '==' with its logic that, apparently, has the bug.The "long string" part of the literal strings is the same four times: twice in test1 and twice in test2. The two strings that are compared with
==
both times differ by one leading space character only. The actual strings intest.py
are a bit longer (but not crazy long, some 159000 ANSI characters/bytes), but they differ just by one leading space still.Full output attached as
output.txt
, that includes the interesting stack trace which hints at where the execution is happening at the moment when the execution is cancelled by the timeout. I will paste that stack trace here for convenience:Thanks!