tophat / syrupy

:pancakes: The sweeter pytest snapshot plugin
https://tophat.github.io/syrupy
Apache License 2.0
501 stars 33 forks source link

fix: large snapshot diff recursion error #776

Closed iamogbz closed 1 year ago

iamogbz commented 1 year ago

Description

Avoids the recursion error limit in difflib.ndiff by truncating the checked lines to a subsection of the diff.

Related Issues

Checklist

Additional Comments

Went with the simple approach of just checking from the first mismatch and giving a large enough buffer to catch meaningful differences.

Can be validated by setting the DIFF_LINE_COUNT_LIMIT to a high value e.g. 100000 and running tests, maybe a follow up can make this configurable;

(syrupy-py3.11) ➜  syrupy git:(fix-large-snapshot-recursion-error) ✗ inv test -t test_diff_large
====================================== test session starts ======================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/Emmanuel/Sources/github/tophat/syrupy
configfile: pyproject.toml
plugins: syrupy-4.0.6, xdist-3.3.1, benchmark-4.0.0
collected 256 items / 254 deselected / 2 selected                                               

tests/syrupy/extensions/test_base.py FF                                                   [100%]

=========================================== FAILURES ============================================
____________________ TestSnapshotReporter.test_diff_large[SnapshotReporter] _____________________

self = <tests.syrupy.extensions.test_base.TestSnapshotReporter object at 0x1067e0c10>
Reporter = <class 'syrupy.extensions.base.SnapshotReporter'>
osenv = <function env_context at 0x105db7a60>

    def test_diff_large(self, Reporter, osenv):
        n_count = 3000
        obj_a = [str(x) + ("a" * 20) for x in range(n_count)]
        obj_b = [line_a + "b" for line_a in obj_a]
        str_a = "\n".join(obj_a)
        str_b = "\n".join(obj_b)
        with osenv(NO_COLOR="true"):
>           assert (
                len(list(Reporter().diff_lines(str_a, str_b)))
                <= DIFF_LINE_COUNT_LIMIT * 2
            )

tests/syrupy/extensions/test_base.py:44: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/syrupy/extensions/base.py:271: in diff_lines
    for line in self.__diff_lines(str(snapshot_data), str(serialized_data)):
src/syrupy/extensions/base.py:295: in __diff_lines
    for line in self.__diffed_lines(a, b):
src/syrupy/extensions/base.py:314: in __diffed_lines
    for line in qdiff(a.splitlines(keepends=True), b.splitlines(keepends=True)):
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:872: in compare
    yield from g
...
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
    yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:985: in _fancy_replace
    yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:997: in _fancy_helper
    yield from g
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:915: in _fancy_replace
    cruncher = SequenceMatcher(self.charjunk)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:182: in __init__
    self.set_seqs(a, b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:194: in set_seqs
    self.set_seq2(b)
../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:248: in set_seq2
    self.__chain_b()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <difflib.SequenceMatcher object at 0x107a71350>

    def __chain_b(self):
        # Because isjunk is a user-defined (not C) function, and we test
        # for junk a LOT, it's important to minimize the number of calls.
        # Before the tricks described here, __chain_b was by far the most
        # time-consuming routine in the whole module!  If anyone sees
        # Jim Roskind, thank him again for profile.py -- I never would
        # have guessed that.
        # The first trick is to build b2j ignoring the possibility
        # of junk.  I.e., we don't call isjunk at all yet.  Throwing
        # out the junk later is much cheaper than building b2j "right"
        # from the start.
        b = self.b
        self.b2j = b2j = {}

        for i, elt in enumerate(b):
            indices = b2j.setdefault(elt, [])
            indices.append(i)

        # Purge junk elements
        self.bjunk = junk = set()
        isjunk = self.isjunk
        if isjunk:
>           for elt in b2j.keys():
E           RecursionError: maximum recursion depth exceeded while calling a Python object

../../../../.pyenv/versions/3.11.4/lib/python3.11/difflib.py:288: RecursionError
==================================== short test summary info ====================================
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporter] - RecursionError: maximum recursion depth exceeded while calling a Python object
FAILED tests/syrupy/extensions/test_base.py::TestSnapshotReporter::test_diff_large[SnapshotReporterNoContext] - RecursionError: maximum recursion depth exceeded while calling a Python object
========================= 2 failed, 254 deselected in 136.72s (0:02:16) =========================
tophat-opensource-bot commented 1 year ago

:tada: This PR is included in version 4.0.7 :tada:

The release is available on:

Your semantic-release bot :package::rocket: