Went with the simple approach of just checking from the first mismatch and giving a large enough buffer to catch meaningful differences.
Can be validated by setting the DIFF_LINE_COUNT_LIMIT to a high value e.g. 100000 and running tests, maybe a follow up can make this configurable;
(syrupy-py3.11) ➜ syrupy git:(fix-large-snapshot-recursion-error) ✗ inv test -t test_diff_large
====================================== test session starts ======================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/Emmanuel/Sources/github/tophat/syrupy
configfile: pyproject.toml
plugins: syrupy-4.0.6, xdist-3.3.1, benchmark-4.0.0
collected 256 items / 254 deselected / 2 selected
tests/syrupy/extensions/test_base.py FF [100%]
=========================================== FAILURES ============================================
____________________ TestSnapshotReporter.test_diff_large[SnapshotReporter] _____________________
self = <tests.syrupy.extensions.test_base.TestSnapshotReporter object at 0x1067e0c10>
Reporter = <class 'syrupy.extensions.base.SnapshotReporter'>
osenv = <function env_context at 0x105db7a60>
def test_diff_large(self, Reporter, osenv):
n_count = 3000
obj_a = [str(x) + ("a" * 20) for x in range(n_count)]
obj_b = [line_a + "b" for line_a in obj_a]
str_a = "\n".join(obj_a)
str_b = "\n".join(obj_b)
with osenv(NO_COLOR="true"):
> assert (
len(list(Reporter().diff_lines(str_a, str_b)))
<= DIFF_LINE_COUNT_LIMIT * 2
)
tests/syrupy/extensions/test_base.py:44:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/syrupy/extensions/base.py:271: in diff_lines
for line in self.__diff_lines(str(snapshot_data), str(serialized_data)):
src/syrupy/extensions/base.py:295: in __diff_lines
for line in self.__diffed_lines(a, b):
src/syrupy/extensions/base.py:314: in __diffed_lines
for line in qdiff(a.splitlines(keepends=True), b.splitlines(keepends=True)):
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:872: in compare
yield from g
...
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:985: in _fancy_replace
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:915: in _fancy_replace
cruncher = SequenceMatcher(self.charjunk)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:182: in __init__
self.set_seqs(a, b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:194: in set_seqs
self.set_seq2(b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:248: in set_seq2
self.__chain_b()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <difflib.SequenceMatcher object at 0x107a71350>
def __chain_b(self):
# Because isjunk is a user-defined (not C) function, and we test
# for junk a LOT, it's important to minimize the number of calls.
# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.
# The first trick is to build b2j ignoring the possibility
# of junk. I.e., we don't call isjunk at all yet. Throwing
# out the junk later is much cheaper than building b2j "right"
# from the start.
b = self.b
self.b2j = b2j = {}
for i, elt in enumerate(b):
indices = b2j.setdefault(elt, [])
indices.append(i)
# Purge junk elements
self.bjunk = junk = set()
isjunk = self.isjunk
if isjunk:
> for elt in b2j.keys():
E RecursionError: maximum recursion depth exceeded while calling a Python object
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:288: RecursionError
==================================== short test summary info ====================================
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporter] - RecursionError: maximum recursion depth exceeded while calling a Python object
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporterNoContext] - RecursionError: maximum recursion depth exceeded while calling a Python object
========================= 2 failed, 254 deselected in 136.72s (0:02:16) =========================
Description
Avoids the recursion error limit in
difflib.ndiff
by truncating the checked lines to a subsection of the diff.inv install
Related Issues
Fixes #577
Helps #581, #625
Checklist
Additional Comments
Went with the simple approach of just checking from the first mismatch and giving a large enough buffer to catch meaningful differences.
Can be validated by setting the
DIFF_LINE_COUNT_LIMIT
to a high value e.g.100000
and running tests, maybe a follow up can make this configurable;