pschanely / CrossHair

An analysis tool for Python that blurs the line between testing and type systems.
Other
996 stars 47 forks source link

CrosshairInternal error on simple example #263

Closed eivindjahren closed 2 months ago

eivindjahren commented 2 months ago

Environment

> pip list | grep crosshair
crosshair-tool                0.0.55                                                                                                                                                                                                                                                                                                                                                                                              
hypothesis-crosshair          0.0.4  

With those installed versions, the following example throws a CrossHairInternal exception:

from hypothesis import given, settings
import hypothesis.strategies as st

magic_words = ["RUN", "JUMP", "PLAY", "SLEEP", "READ", "SAY"]

def parse_spells(output: str) -> dict[str, str]:
    spells = {}
    for line in output.splitlines():
        tokens = line.split("^")
        if len(tokens) == 2:
            name, spell = tokens
            if spell not in magic_words:
                continue
            spells[name] = spell
    return spells

@given(st.text())
@settings(backend="crosshair")
def test_parse_spells(text):
    assert isinstance(parse_spells(text), dict)
pschanely commented 2 months ago

Thanks for the report!

This issue is specific to the hypothesis-crosshair integration. I am not sure about resolution yet, but updating here anyway to record my initial diagnostics.

I think there were some caching-related changes in hypothesis 6.100.6 which could request the hash of a symbolic outside the spans where we expect symbolics to be accessed. My trace as of hypothesis 6.104.1:

../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:731: in run
    self._run()
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:1193: in _run
    self.generate_new_examples()
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:876: in generate_new_examples
    zero_data = self.cached_test_function(bytes(BUFFER_SIZE))
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:1427: in cached_test_function
    self.test_function(data)
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:456: in test_function
    self._cache(data)
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/conjecture/engine.py:381: in _cache
    self.__data_cache_ir[key] = result
../../../.pyenv/versions/3.10.10/lib/python3.10/site-packages/hypothesis/internal/cache.py:111: in __setitem__
    i = self.keys_to_indices[key]
../../CrossHair/crosshair/libimpl/builtinslib.py:3118: in __hash__
    return hash(self.__str__())
tybug commented 2 months ago

ah yup. Previously we didn't interact with ir nodes when caching. Probably we need to move post_test_case_hook before any _cache calls occur; here's an untested patch:

diff --git a/hypothesis-python/src/hypothesis/internal/conjecture/engine.py b/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
index 89a8c7829..d3b7f5a92 100644
--- a/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
+++ b/hypothesis-python/src/hypothesis/internal/conjecture/engine.py
@@ -435,6 +435,14 @@ class ConjectureRunner:
                     ),
                 }
                 self.stats_per_test_case.append(call_stats)
+                for node in data.examples.ir_tree_nodes:
+                    value = data.provider.post_test_case_hook(node.value)
+                    # require providers to return something valid here.
+                    assert (
+                        value is not None
+                    ), "providers must return a non-null value from post_test_case_hook"
+                    node.value = value
+
                 self._cache(data)
                 if data.invalid_at is not None:  # pragma: no branch # coverage bug?
                     self.misaligned_count += 1
@@ -478,14 +486,6 @@ class ConjectureRunner:

         if data.status == Status.INTERESTING:
             if self.settings.backend != "hypothesis":
-                for node in data.examples.ir_tree_nodes:
-                    value = data.provider.post_test_case_hook(node.value)
-                    # require providers to return something valid here.
-                    assert (
-                        value is not None
-                    ), "providers must return a non-null value from post_test_case_hook"
-                    node.value = value
-
                 # drive the ir tree through the test function to convert it
                 # to a buffer
                 data = ConjectureData.for_ir_tree(data.examples.ir_tree_nodes)

I'll check this patch with crosshair later today.

tybug commented 2 months ago

That was it - https://github.com/HypothesisWorks/hypothesis/pull/4022 will fix this. (And hypothesis-crosshair does indeed find the desired bug!)

pschanely commented 2 months ago

@tybug you are a machine. Thank you!