runtimeverification / kontrol

BSD 3-Clause "New" or "Revised" License
49 stars 6 forks source link

Kontrol Loop Invariants knowledge #866

Open ehildenb opened 4 days ago

ehildenb commented 4 days ago

This issue attempts to bring together all of the insights on building loop invariants in Kontrol.

PetarMax commented 4 days ago

Here is the Optimism invariant as the example:

    rule [copy-memory-to-memory-summary]:
      <k> #execute ... </k>
      <useGas> false </useGas>
      <schedule> SHANGHAI </schedule>
      <jumpDests> JUMPDESTS </jumpDests>
      // The program and program counter are symbolic, focusing on the part we will be executing (CP)
      <program> PROGRAM </program>
      <pc> PCOUNT => PCOUNT +Int 53 </pc>
      // The word stack has the appropriate form, as per the compiled code
      <wordStack> LENGTH : _ : SRC : DEST : WS </wordStack>
      // The program copies LENGTH bytes of memory from SRC +Int 32 to DEST +Int OFFSET,
      // padded with 32 zeros in case LENGTH is not divisible by 32
      <localMem>
        LM => LM [ DEST +Int 32 := #range ( LM, SRC +Int 32, LENGTH ) +Bytes
                                   #buf ( ( ( notMaxUInt5 &Int ( LENGTH +Int maxUInt5 ) ) -Int LENGTH ) , 0 ) +Bytes
                                   #buf ( ( ( ( 32 -Int ( ( notMaxUInt5 &Int ( LENGTH +Int maxUInt5 ) ) -Int LENGTH ) ) ) modInt 32 ), 0 ) ]
      </localMem>
      requires
       // The current program we are executing differs from the original one only in the hardcoded jump addresses,
       // which are now relative to PCOUNT, and the hardcoded offset, which is now symbolic.
               #range(PROGRAM, PCOUNT, 53) ==K b"`\x00[\x81\x81\x10\x15b\x00\x81`W` \x81\x85\x01\x81\x01Q\x86\x83\x01\x82\x01R\x01b\x00\x81BV[\x81\x81\x11\x15b\x00\x81sW`\x00` \x83\x87\x01\x01R[P"
                                               [ 08 := #buf(3, PCOUNT +Int 32) ]
                                               [ 28 := #buf(3, PCOUNT +Int  2) ]
                                               [ 38 := #buf(3, PCOUNT +Int 51) ]

       // Various well-formedness constraints. In particular, the maxBytesLength-related ones are present to
       // remove various chops that would otherwise creep into the execution, and are reasonable since byte
       // arrays in actual programs would never reach that size.
       andBool 0 <=Int PCOUNT
       andBool 0 <=Int LENGTH andBool LENGTH <Int maxBytesLength
       andBool 0 <=Int SRC    andBool SRC    <Int maxBytesLength
       andBool 0 <=Int DEST   andBool DEST   <Int maxBytesLength
       andBool #sizeWordStack(WS) <=Int 1015
       andBool SRC +Int LENGTH <=Int DEST // No overlap between source and destination
       andBool DEST <=Int lengthBytes(LM) // Destination starts within current memory
       andBool PCOUNT +Int 51 <Int lengthBytes(JUMPDESTS) // We are not looking outside of the JUMPDESTs bytearray
       // All JUMPDESTs in the program are valid
       andBool JUMPDESTS[PCOUNT +Int 2] ==Int 1 andBool JUMPDESTS[PCOUNT +Int 32] ==Int 1 andBool JUMPDESTS[PCOUNT +Int 51] ==Int 1
       andBool PCOUNT +Int 51 <Int 2 ^Int 24  // and fit into three bytes
      [priority(30), concrete(JUMPDESTS, PROGRAM, PCOUNT), preserves-definedness]
PetarMax commented 4 days ago

Things that need to be understood:

  1. Where does the loop really start? The KCFG will reveal the branching point, but there are likely to be several prologue instructions before that point. For this loop, there are two such instructions, PUSH(1); 0, captured in the bytecode by the leading b"\x00"`.
  2. Where does the loop really end? The way to track this is to execute the proof with --break-every-step from the branching point and monitor the instructions. At some point, there will be a JUMP back to the branching point, telling us where the loop body ends, but beyond that, normally contiguously, there could be several epilogue instructions (for example, to restore the word stack format to the desired one, see 3. below).
  3. What is the general shape of the word stack and how does it relate to the loop parameters? This has to be understood by examining the word stack structure at several of the branching points, and it also helps one understand where the loop really starts and ends (see 1. and 2. above), because the effect of the loop is often seen on the word stack, and if not (as it the case for the Optimism loop above), there's a good chance that the word stacks will be the same before and after.
  4. What is the effect of the loop? Sometimes it is on the word stack, sometimes it is in memory, sometimes it is in storage. The loop body normally holds the answer; the should be, for example, MSTORE and SSTORE instructions if memory or storage is modified.
RaoulSchaffranek commented 3 days ago

Just dropping my blog post on loop invariants here: https://runtimeverification.com/blog/formally-verifying-loops-part-2

@PetarMax I'm impressed by the abstraction level of the Optimism invariant. It seems like this rule is general enough to match other loops outside the Optimism code base and I'm wondering if it matches ALL compiler-generated memory-to-memory copy loops. Would it make sense to include this as a general lemma into Kontrol? Or are there some bits specific to Optimism?

PetarMax commented 3 days ago

This invariant is very general and illustrates a general approach, but it is for a specific EVM-level loop that results from the copy_memory_to_memory Yul function in the context of Optimism. The compilation from Yul to EVM could yield different results pretty randomly (it has already), and so there's (sadly) no guarantee that it'd be useful anywhere elsewhere. Also, it is only useful if we have a bytearray of symbolic size, otherwise it will slow the execution down.