runtimeverification / kontrol

BSD 3-Clause "New" or "Revised" License
56 stars 9 forks source link

Construct an appropriate `is_mergable` heuristic for merging nodes and composable verification. #703

Open Stevengre opened 4 months ago

Stevengre commented 4 months ago

Currently, we have implemented a language-agnostic merge_node (K#4425). However, applying it to practical scenarios and solving problems like Kontrol#448 still requires addressing the following challenges:

  1. From the verification user’s perspective, what are the understandable benefits of merged nodes? For example, loop-invariant generation or support for composable verification at the function level?
  2. From a requirements/theoretical perspective, what kind of nodes are mergable or are reasonable to be merged?
  3. From a technical perspective, how to distinguish them by a simple specification.
  4. Theoretical reliability analysis: Can these specifications theoretically meet our requirements?
  5. Implementation reliability analysis: Can our implementation achieve the theoretical results?

Some answers to the above issues:

  1. According to the current, users are easy to check the unexpected result of the auto-merged function spec. However, they cannot specify concrete behavior of that function. They can only specify the function under different status.

Here are some small steps/tasks that I plan to complete next:

Stevengre commented 4 months ago

Merging Effect: Reduce the number of generated rules through anti-unification and minimize the splits caused by these rules.

Intuitive Understanding and Explanation: Automatically obtain a specification for a function from an external call perspective, thereby omitting the detailed implementation steps. For this purpose, each generated rule should ideally encompass as much depth as possible. Merging can combine the split edges to increase the depth density of each generated rule.

Stevengre commented 4 months ago

Initial Heuristic provided by @ehildenb :

def is_mergable(ct1: CTerm, ct2: CTerm) -> bool:
    status_code_1 = ct1.cell('STATUSCODE_CELL')
    status_code_2 = ct2.cell('STATUSCODE_CELL')
    program_1 = ct1.cell('PROGRAM_CELL')
    program_2 = ct2.cell('PROGRAM_CELL')
    if type(status_code_1) is KApply and type(status_code_2) is KApply and type(program_1) is KToken and type(program_2) is KToken:
        return status_code_1 == status_code_2 and program_1 == program_2:
    raise ValueError(f'Attempted to merge nodes with non-concrete <statusCode> or <program>: {(ct1, ct2)}')
Stevengre commented 4 months ago

Heuristic Validation:

Case: ActivateNextStateTest.testActivateNextStateTermination

Validation Steps:

Further Exploration:

Stevengre commented 4 months ago

Check Heuristic:

Randomly select two branches (270 -> 67 and 271 -> 63) and examine the and of their respective source and target.

67:

<statusCode> EVMC_SUCCESS </statusCode>
<program> test%kontrol%ActivateNextStateTest </program>

63:

<statusCode> EVMC_SUCCESS </statusCode>
<program> test%kontrol%ActivateNextStateTest </program>

270

<statusCode> EVMC_SUCCESS </statusCode>
<program> test%kontrol%ActivateNextStateTest </program>
<program> contracts%DualGovernance </program>

271

<statusCode> EVMC_SUCCESS </statusCode>
<program> test%kontrol%ActivateNextStateTest </program>
<program> contracts%DualGovernance </program>
Stevengre commented 4 months ago

Check Result:

I do not believe these are the nodes we want to merge for the following reasons:

  1. The rules synthesized using this method have the on the rhs as test%kontrol%ActivateNextStateTest. This requires users to call this specific test to reuse the proof, which is clearly unreasonable.
  2. Similarly, the on the lhs of the rules synthesized using this method includes test%kontrol%ActivateNextStateTest. This prevents us from reusing this summary when verifying other contracts that call contracts%DualGovernance.

Additionally, the and for 270 and 271 are fixed values and identical. This further prevents the generated rules from being usable.

Stevengre commented 4 months ago

To make summarization and minimize_proof more useful, we need to ensure that the generated rules can be reused across contracts. This means that the generated rules should correspond to the functions of a contract, serving as either a mock or a specification for the contract. For example, a summarized kcfg should be either a summary of contracts%DualGovernance or a specific function within contracts%DualGovernance, rather than a summary of the test-suite.

For instance, for dualGovernance.activateNextState(), we should:

  1. Obtain a rule/edge from its input state to its output state, which can be highly abstract.
  2. When verifying functions using that function, use this rule as a mock to replace its original implementation (which may result in false positives).
  3. If verification fails and the specification & implementation are deemed correct, proceed with refinement:
    1. Manual refinement: Provide assumptions about the input state and assertions about the output state through comments or other means. These assumptions will be used as assertions when other functions call this one, ensuring that if the caller does not satisfy the assumptions, the generated rule will not match. The output state assertions will be verified under these assumptions before generating the rule. The current harness functions in t.sol could also serve as manual refinement specificiation, but they are not very intuitive and seem to be more complex to obtain expected rules. So far, I don't have any good ideas.
    2. Automatic refinement: Replace the mock with activateNextState()'s implementation while keeping the function calls in activateNextState() as mocks for automatic refinement.

The method to obtain the rule in step 1 is as follows:

  1. Construct a CTerm containing all the states required to run dualGovernance.activateNextState(). States not of interest should be arbitrary symbolic variables. The current state should be the state immediately after the function call, but before the function is executed. Therefore, the top of the stack should contain an equal number of symbolic variables as the states used by the first instruction in the function.
  2. Symbolically execute the CTerm until it cannot proceed further, with weak loop inv generation during the execution. Then, use minimize_kcfg and merge_node to obtain one or more edges from the constructed CTerm to the terminal state, thereby generating the corresponding rule.