pytest-dev / pytest

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
https://pytest.org
MIT License
11.97k stars 2.66k forks source link

Running pytest on a certain code snippet OOMs, apparently in collection #10205

Closed MapleCCC closed 2 years ago

MapleCCC commented 2 years ago

Description

I tried to run pytest on a piece of code (shown later). Pytest proceeded, and printed to the console saying that it was "collecting".

After five to ten seconds, the whole computer froze, completely. As if a still picture, everything on the screen was inanimate. The computer ceased to react to any input. The mouse can't be moved. Pressing keys on the keyboard had no effect. I also powered on a pair of Bluetooth earbuds, trying to see if the computer still can automatically connect to them, but to no avail.

The freeze state lasted 15-20 minutes with no sign of recovering, so I was left with no choice but to force shutdown my computer, by hard pressing the power button.

After restarting my computer, I tried again to run pytest on that same piece of code. The exact same hazard rehappened. Everything was again inanimate and unresponsive. The computer was frozen again, and I had to force shutdown it again.

Environment

Windows 10.0.19042.1526 Python 3.10.5 Pytest 7.1.2

Click to expand to see the result from the `pip list` command atomicwrites 1.4.1 attrs 22.1.0 colorama 0.4.5 exceptiongroup 1.0.0rc8 hypothesis 6.54.2 iniconfig 1.1.1 numpy 1.23.0 packaging 21.3 pandas 1.4.3 pip 22.0.4 pluggy 1.0.0 py 1.11.0 pyparsing 3.0.9 pytest 7.1.2 pytest-instafail 0.4.2 pytest-sugar 0.9.5 python-dateutil 2.8.2 pytz 2022.1 setuptools 58.1.0 six 1.16.0 sortedcontainers 2.4.0 termcolor 1.1.0 tomli 2.0.1 ujson 5.3.0 wheel 0.37.1

Code to Reproduce

To reproduce, store the following code snippet to a file named xor_trick.py. Then run pytest from the command line: pytest xor_trick.py.

Note that I did not try to reproduce for the third time, because my computer is about seven years old, and force shutdown is bad for it.

Also a side note: the code snippet is trying to implement the XOR tricks mentioned in this blog post. Not sure if this matters.

import operator
from collections.abc import Iterable
from functools import reduce

def xor(nums: Iterable[int]) -> int:
    return reduce(operator.xor, nums, 0)

def solve_case_of_n_plus_one(nums: list[int]) -> int:
    N = len(nums) - 1
    return xor(range(1, N+1)) ^ xor(nums)

def solve_case_of_n_plus_two(nums: list[int]) -> tuple[int, int]:
    N = len(nums) - 2
    x = xor(range(1, N+1)) ^ xor(nums)
    if x == 0:
        raise NotImplementedError
    mask = x - (x & (x - 1))
    u = v = x
    for num in nums:
        if num & mask:
            u ^= num
        else:
            v ^= num
    return u, v

from hypothesis import given
from hypothesis.strategies import DrawFn, composite, integers, lists, permutations, sampled_from

@composite
def sample_data(draw: DrawFn) -> list[int]:
    N = draw(integers(1))
    nums = list(range(1, N+1))
    nums.append(draw(sampled_from(nums)))
    return draw(permutations(nums))

@given(sample_data())
def test_solve_case_of_n_plus_one(nums: list[int]) -> None:
    assert nums.count(solve_case_of_n_plus_one(nums)) == 2

@given(sample_data())
def test_solve_case_of_n_plus_two(nums: list[int]) -> None:
    u, v = solve_case_of_n_plus_two(nums)
    assert nums.count(u) == nums.count(v) == 2

Feel free to let me know if you need more details.

Zac-HD commented 2 years ago

Ah, got it: running this while watching memory usage established that it was an out-of-memory error, and then I quickly narrowed it down:

from hypothesis import given
from hypothesis.strategies import composite, integers

@composite
def sample_data(draw):
    # Here's the problem: the moment you pick some N larger than available memory,
    # you'll crash because of the list(range(N)) call below.  Nothing mysterious about 
    # that, though your computer really shouldn't need a force-restart on OOM!
    N = draw(integers(1))
    list(range(1, N + 1))

@given(sample_data())
def test(_):
    pass

The solution is to pick a reasonable maximum for N; I'd suggest 256 or so should be plenty, or otherwise rewrite your strategy to avoid requiring yottabytes of RAM. You can even sample from a range object directly!

MapleCCC commented 2 years ago

@Zac-HD Thank you very much for narrowing down the cause. Your detailed dissection is wonderful, and greatly appreciated.

This taught me a lesson that I should not draw an arbitrary integer from Hypothesis and then try to materialize a collection with it as length. The convenience of property-based testing really helps me to think in abstract level, and sometimes forgot that there are also real world constraints to consider.