runtimeverification / evm-semantics

K Semantics of the Ethereum Virtual Machine (EVM)
BSD 3-Clause "New" or "Revised" License
506 stars 140 forks source link

Output concrete counter-example for failed proofs #1927

Closed ehildenb closed 1 year ago

ehildenb commented 1 year ago

Related: https://github.com/runtimeverification/evm-semantics/issues/1700 and https://github.com/runtimeverification/evm-semantics/pull/1716

~Blocked on: https://github.com/runtimeverification/haskell-backend/issues/3616~

Following https://github.com/runtimeverification/evm-semantics/pull/1716, we now have the path-feasibility constraints being output on failing proofs. We want to improve on this by providing the user with concrete assignment to variables which demonstrates the failed test. This is now possible, because we have implemented the get-model endpoint in the Haskell backend: (https://github.com/runtimeverification/haskell-backend/commit/06c9515d802093b7e3267e2d3e7460f4a2cf8ede and https://github.com/runtimeverification/pyk/pull/509).

We should expose this functionality in the KEVM foundry integration in these ways:

diff --git a/src/pyk/kcfg/explore.py b/src/pyk/kcfg/explore.py
index 4ad3e09a..e99dc4d7 100644
--- a/src/pyk/kcfg/explore.py
+++ b/src/pyk/kcfg/explore.py
@@ -168,6 +168,13 @@ class KCFGExplore(ContextManager['KCFGExplore']):
         kast_simplified = self.kprint.kore_to_kast(kore_simplified)
         return kast_simplified, logs

+    def cterm_get_model(self, cterm: CTerm) -> Subst | None:
+        # convert the cterm to Kore
+        # call get_model in the RPC server
+        # convert the Kore result back to Kast
+        # convert the kast to a substitution
+        # return the substitution
+
     def kast_simplify(self, kast: KInner) -> tuple[KInner, tuple[LogEntry, ...]]:
         _LOGGER.debug(f'Simplifying: {kast}')
         kore = self.kprint.kast_to_kore(kast, GENERATED_TOP_CELL)
diff --git a/src/tests/integration/kcfg/test_imp.py b/src/tests/integration/kcfg/test_imp.py
index 925b1ddf..cfe39e46 100644
--- a/src/tests/integration/kcfg/test_imp.py
+++ b/src/tests/integration/kcfg/test_imp.py
@@ -518,6 +518,22 @@ APRBMC_PROVE_TEST_DATA: Iterable[
     ),
 )

+GET_MODEL_TEST_DATA = (
+    (
+        'model-works',
+        'int $n;',
+        '$n |-> N',
+        ['N >=Int 3'],
+        Subst({'N', token(4)}), # True
+    ),
+    (
+        'model-doesnt-work',
+        'int $n;',
+        '$n |-> N',
+        ['N >=Int 3', 'N <Int 2'],
+        None, # False
+    ),
+)

 def leaf_number(proof: APRProof) -> int:
     non_target_leaves = [nd for nd in proof.kcfg.leaves if not proof.is_target(nd.id)]
@@ -847,3 +863,24 @@ class TestImpProof(KCFGExploreTest):

         assert proof.status == proof_status
         assert leaf_number(proof) == expected_leaf_number
+
+    @pytest.mark.parametrize(
+        'test_id,k_cell,state_cell,model_status',
+        GET_MODEL_TEST_DATA,
+        ids=[test_id for test_id, *_ in GET_MODEL_TEST_DATA],
+    )
+    def test_get_model(
+        self,
+        kprove: KProve,
+        kcfg_explore: KCFGExplore,
+        test_id: str,
+        k_cell: str,
+        state_cell: str,
+        constraints: Iterable[str],
+        expected_model: Subst | None,
+    ) -> None:
+        cterm = config(k_cell, state_cell, *constraints)
+
+        model = kcfg_explore.cterm_get_model(cterm)
+
+        assert model == expected_model
diff --git a/kevm-pyk/src/kevm_pyk/foundry.py b/kevm-pyk/src/kevm_pyk/foundry.py
index 6a6173a1a..5c30f9c32 100644
--- a/kevm-pyk/src/kevm_pyk/foundry.py
+++ b/kevm-pyk/src/kevm_pyk/foundry.py
@@ -535,6 +535,9 @@ def foundry_prove(
             failure_log = None
             if not passed:
                 failure_log = print_failure_info(proof, kcfg_explore)
+                for node in proof.failing:
+                    model = kcfg_explore.cterm_get_model(node.cterm)
+                    failure_log.append(model)

             return passed, failure_log
ehildenb commented 1 year ago

Once we have the testing harness for IMP setup in teh pyk repo and at least 2 tests (one of successful call to get-model, one of failing call), we can merge it and then begin testing downstream with kevm. If things do not work with downstream KEVM, we'll add more tests to the test-harness and start making adjustments to the get-model endpoint.

palinatolmach commented 1 year ago

The cterm_get_model function is added in the cterm-get-model branch of pyk with two passing tests (one successful, and one unsuccessful). The integration with KEVM in the foundry_prove function is added in foundry-counterexamples.

This issue is currently blocked on runtimeverification/haskell-backend#3616. While for some simple asserts the counterexamples are demonstrated correctly, the predicates involving chop are not evaluated in the get-model request, which produces incorrect assignments in the counterexamples.

palinatolmach commented 1 year ago

runtimeverification/haskell-backend#3616 is fixed by adding an smt-hook for the chop function. This issue, however, helped identify that the SMT solver has issues with reasoning about non-linear arithmetic expressions (e.g., 2 ^ 256), which might appear in other get-model requests.