Output concrete counter-example for failed proofs

ehildenb commented 1 year ago

~Blocked on: https://github.com/runtimeverification/haskell-backend/issues/3616~

Following https://github.com/runtimeverification/evm-semantics/pull/1716, we now have the path-feasibility constraints being output on failing proofs. We want to improve on this by providing the user with concrete assignment to variables which demonstrates the failed test. This is now possible, because we have implemented the get-model endpoint in the Haskell backend: (https://github.com/runtimeverification/haskell-backend/commit/06c9515d802093b7e3267e2d3e7460f4a2cf8ede and https://github.com/runtimeverification/pyk/pull/509).

We should expose this functionality in the KEVM foundry integration in these ways:

First, add KCFGExplore.get_model(cterm: CTerm) -> Subst | None, which takes a constrained term and returns either a Subst (model found), or None, no model found. This should happen in KCFGExplore class, and can have psuedocode like:

diff --git a/src/pyk/kcfg/explore.py b/src/pyk/kcfg/explore.py
index 4ad3e09a..e99dc4d7 100644
--- a/src/pyk/kcfg/explore.py
+++ b/src/pyk/kcfg/explore.py
@@ -168,6 +168,13 @@ class KCFGExplore(ContextManager['KCFGExplore']):
         kast_simplified = self.kprint.kore_to_kast(kore_simplified)
         return kast_simplified, logs

+    def cterm_get_model(self, cterm: CTerm) -> Subst | None:
+        # convert the cterm to Kore
+        # call get_model in the RPC server
+        # convert the Kore result back to Kast
+        # convert the kast to a substitution
+        # return the substitution
+
     def kast_simplify(self, kast: KInner) -> tuple[KInner, tuple[LogEntry, ...]]:
         _LOGGER.debug(f'Simplifying: {kast}')
         kore = self.kprint.kast_to_kore(kast, GENERATED_TOP_CELL)

Then, make a testing harness for this KCFGExplore.get_model, which tests it with IMP. That can look something like:

diff --git a/src/tests/integration/kcfg/test_imp.py b/src/tests/integration/kcfg/test_imp.py
index 925b1ddf..cfe39e46 100644
--- a/src/tests/integration/kcfg/test_imp.py
+++ b/src/tests/integration/kcfg/test_imp.py
@@ -518,6 +518,22 @@ APRBMC_PROVE_TEST_DATA: Iterable[
     ),
 )

+GET_MODEL_TEST_DATA = (
+    (
+        'model-works',
+        'int $n;',
+        '$n |-> N',
+        ['N >=Int 3'],
+        Subst({'N', token(4)}), # True
+    ),
+    (
+        'model-doesnt-work',
+        'int $n;',
+        '$n |-> N',
+        ['N >=Int 3', 'N <Int 2'],
+        None, # False
+    ),
+)

 def leaf_number(proof: APRProof) -> int:
     non_target_leaves = [nd for nd in proof.kcfg.leaves if not proof.is_target(nd.id)]
@@ -847,3 +863,24 @@ class TestImpProof(KCFGExploreTest):

         assert proof.status == proof_status
         assert leaf_number(proof) == expected_leaf_number
+
+    @pytest.mark.parametrize(
+        'test_id,k_cell,state_cell,model_status',
+        GET_MODEL_TEST_DATA,
+        ids=[test_id for test_id, *_ in GET_MODEL_TEST_DATA],
+    )
+    def test_get_model(
+        self,
+        kprove: KProve,
+        kcfg_explore: KCFGExplore,
+        test_id: str,
+        k_cell: str,
+        state_cell: str,
+        constraints: Iterable[str],
+        expected_model: Subst | None,
+    ) -> None:
+        cterm = config(k_cell, state_cell, *constraints)
+
+        model = kcfg_explore.cterm_get_model(cterm)
+
+        assert model == expected_model

Then, test this with KEVM. We should modify foundry_prove (here: https://github.com/runtimeverification/evm-semantics/blob/93e5e06800d88663bc206fcdafb15c7df917357b/kevm-pyk/src/kevm_pyk/foundry.py#L537), to check if there are any failing nodes, and if so, call get_model on them, and output that information to the failure log. That can look something like this:

diff --git a/kevm-pyk/src/kevm_pyk/foundry.py b/kevm-pyk/src/kevm_pyk/foundry.py
index 6a6173a1a..5c30f9c32 100644
--- a/kevm-pyk/src/kevm_pyk/foundry.py
+++ b/kevm-pyk/src/kevm_pyk/foundry.py
@@ -535,6 +535,9 @@ def foundry_prove(
             failure_log = None
             if not passed:
                 failure_log = print_failure_info(proof, kcfg_explore)
+                for node in proof.failing:
+                    model = kcfg_explore.cterm_get_model(node.cterm)
+                    failure_log.append(model)

             return passed, failure_log

Finally, we can expose a direct command kevm foundry-get-model TEST_NAME NODE_ID, which gets a model for a specific node, and can take options --failing and --pending, to pre-populate the list of nodes to include in the get-model call (similar to how exec_foundry_show works). We can also modify exec_foundry_show to include get-model information directly perhaps.

ehildenb commented 1 year ago

Once we have the testing harness for IMP setup in teh pyk repo and at least 2 tests (one of successful call to get-model, one of failing call), we can merge it and then begin testing downstream with kevm. If things do not work with downstream KEVM, we'll add more tests to the test-harness and start making adjustments to the get-model endpoint.

palinatolmach commented 1 year ago

The cterm_get_model function is added in the cterm-get-model branch of pyk with two passing tests (one successful, and one unsuccessful). The integration with KEVM in the foundry_prove function is added in foundry-counterexamples.

This issue is currently blocked on runtimeverification/haskell-backend#3616. While for some simple asserts the counterexamples are demonstrated correctly, the predicates involving chop are not evaluated in the get-model request, which produces incorrect assignments in the counterexamples.

palinatolmach commented 1 year ago

runtimeverification/haskell-backend#3616 is fixed by adding an smt-hook for the chop function. This issue, however, helped identify that the SMT solver has issues with reasoning about non-linear arithmetic expressions (e.g., 2 ^ 256), which might appear in other get-model requests.

runtimeverification / evm-semantics

Output concrete counter-example for failed proofs #1927