python-trio / flake8-async

Highly opinionated linter for Trio code
https://flake8-async.readthedocs.io
MIT License
17 stars 2 forks source link

fuzz test fail due to unreliable timings #172

Closed jakkdl closed 1 year ago

jakkdl commented 1 year ago

Test run on main fails, though at least this time it's not obviously a configuration error.

Pulling out the multiline message

raise Flaky( hypothesis.errors.Flaky: Hypothesis test_does_not_crash_on_any_valid_code(self=, syntax_tree=<ast.Module object at 0x7f6cdd1faf80>) produces unreliable results: Falsified on the first call but did not on a subsequent one Falsifying example: test_does_not_crash_on_any_valid_code( syntax_tree=parse('assert {}\nclass A: pass\ndef A(): pass\n'), self=, ) Unreliable test timings! On an initial run, this test took 608.31ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 6.86 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.

Is this due to xdist/CI servers being unreliable, or something weird in hypothesis, or is it an actual bug in the underlying code? Would be great to figure out why tests sometimes take 600ms I'll probably not investigate this on my own until you've given your opinion since you're about a million times more experienced with hypothesmith/hypothesis.

=================================== FAILURES ===================================
________________ TestFuzz.test_does_not_crash_on_any_valid_code ________________
[gw1] linux -- Python 3.11.2 /home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/bin/python
Traceback (most recent call last):
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 878, in _execute_once_for_engine
    result = self.execute_once(data)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 817, in execute_once
    result = self.test_runner(data, run)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/executors.py", line 47, in default_new_style_executor
    return function(data)
           ^^^^^^^^^^^^^^
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 813, in run
    return test(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/flake8-trio/flake8-trio/tests/test_flake8_trio.py", line 647, in test_does_not_crash_on_any_valid_code
    @given((from_grammar() | from_node()).map(ast.parse))
               ~~~~~~~~~~~~^~~~~~~~
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 781, in test
    raise DeadlineExceeded(runtime, self.settings.deadline)
hypothesis.errors.DeadlineExceeded: Test took 608.31ms, which exceeds the deadline of 200.00ms

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/flake8-trio/flake8-trio/tests/test_flake8_trio.py", line 647, in test_does_not_crash_on_any_valid_code
    @given((from_grammar() | from_node()).map(ast.parse))
               ^^^^^^^
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 1396, in wrapped_test
    raise the_error_hypothesis_found
  File "/home/runner/work/flake8-trio/flake8-trio/.tox/flake8_6/lib/python3.11/site-packages/hypothesis/core.py", line 842, in execute_once
    raise Flaky(
hypothesis.errors.Flaky: Hypothesis test_does_not_crash_on_any_valid_code(self=<tests.test_flake8_trio.TestFuzz testMethod=test_does_not_crash_on_any_valid_code>, syntax_tree=<ast.Module object at 0x7f6cdd1faf80>) produces unreliable results: Falsified on the first call but did not on a subsequent one
Falsifying example: test_does_not_crash_on_any_valid_code(
    syntax_tree=parse('assert  {}\nclass A: pass\ndef A(): pass\n'),
    self=<tests.test_flake8_trio.TestFuzz testMethod=test_does_not_crash_on_any_valid_code>,
)
Unreliable test timings! On an initial run, this test took 608.31ms, which exceeded the deadline of 200.00ms, but on a subsequent run it took 6.86 ms, which did not. If you expect this sort of variability in your test timings, consider turning deadlines off for this test by setting deadline=None.
Highest target scores:
              17  (label='(hypothesmith from_node) number of unique ast node types')
              21  (label='(hypothesmith) number of unique ast node types')
             112  (label='(hypothesmith) total number of ast nodes')
             181  (label='(hypothesmith) instructions in bytecode')
            1031  (label='(hypothesmith from_node) total number of ast nodes')
            2331  (label='(hypothesmith from_node) instructions in bytecode')

=========================== short test summary info ============================
FAILED tests/test_flake8_trio.py::TestFuzz::test_does_not_crash_on_any_valid_code
=================== 1 failed, 2 passed in 930.96s (0:15:30) ====================
flake8_6: exit 1 (931.58 seconds) /home/runner/work/flake8-trio/flake8-trio> pytest --onlyfuzz --no-cov -n auto pid=1800
.pkg: _exit> python /opt/hostedtoolcache/Python/3.11.2/x64/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  flake8_6: FAIL code 1 (937.07=setup[5.49]+cmd[931.58] seconds)
  evaluation failed :( (937.15 seconds)
Error: Process completed with exit code 1.
Zac-HD commented 1 year ago

I think this test is probably just slow for some inputs, and we could disable the deadline. The existence of a cache for something slow is sufficient to explain the variability part.