Further optimisation of valid jump destination computation

PetarMax commented 4 months ago

Currently, we are computing jump destinations AOT, with the help of pyk. This allows us to handle programs with symbolic parts (such as immutable variables), but is slightly slower than the original approach that was fully integrated with the semantics.

We also have a PR open in KEVM that enables JIT computation, and which is much faster, but incomplete in that it does not account for cases when a jump is made into PUSH data, which should result in a EVMC_BAD_JUMP_DESTINATION error. Extending that PR to account for these cases as well is likely to end up being slower, since what we would need to do is, given a check for a jump destination I:

take the 32 bytes of the program preceding I;
check, for each of these bytes, from left to right, if the corresponding opcode is a PUSH-N, such that I is included in the N following bytes;
make sure to skip all push data found in the process that does not include I.

In general, the assumption that no JUMP/JUMPI lands in PUSH data does not hold for arbitrary EVM code. However, it could be safe in the case of Kontrol, since we are working exclusively with EVM code coming from the compilation of Solidity files. Since the EVM execution never continues if such a jump occurs, I can't think of a use case for the compiler do be doing this deliberately. Perhaps we could ask the developers for their opinion.

If we think that this assumption will be safe, we could implement a layer in Kontrol only that always sets the jump destinations to .Set and only does a basic JIT check. This should give us a solid speed-up.

Alternatively, we could think of automatically generating K rules that capture valid jump destinations for each contract. These rules should be handled very efficiently by the LLVM backend.

ehildenb commented 4 months ago

Actually, checking the preceding 32 bytes is not enough. You have to also check, if you do find a PUSH in that preceding 32 bytes, is that PUSH actually just part of the push data of a prior push? So there isn't a way to know except scanning from the start of the bytecode.

PetarMax commented 4 months ago

Yes, that is correct.

palinatolmach commented 3 months ago

Other considerations from Slack:

@anvacaru: If set reasoning is the main bottleneck perhaps we could refactor the way we store the valid jumpdests. Instead of having the valid jumpdests as Set elements, we could store them in some kind of frequency array, a boolean array the same length as the bytecode. We should still be able to handle symbolic immutables and everything. It seems like this is the approach used by the rust evm implementation (link)
@ehildenb: We can also explore other mechanisms (like creating a function-table) for computing valid jump dests. That involves code-gen, which maybe we want to avoid, but at least could be correct for any EVM execution, not just Solidity ones,
Or, we could have a tool which scans the bytecode for JUMP and JUMPI instructions, and checks that they are all preceded with a PUSH(...) to a concrete value, then checks that all of those destinations are valid jumpdests (by scanning ahead of time in python). If we confirm that for every bytecode that we load, we can just turn off jump dest validity checks altogether. That would probably also work entirely at the bytecode level. While this approach seems easier and would require straightforward static analysis, it's complicated by the fact that JUMPs are not always preceded by a concrete PUSH, and the destination might be dynamic, e.g. https://ethereum.stackexchange.com/questions/70121/evm-bytecode-debugging-jumps.

In addition, there's a potential follow-up to a change in valid JUMPDESTs calculation implemented in https://github.com/runtimeverification/kontrol/pull/596. As @anvacaru noted, we can make the 1loadProgram` cut rule optional iff we're executing a constructor with symbolic arguments. We can probably add or pass an option enabling 'EVM.program.load' as a cut-point rule optionally based on the test signature and it being a constructor here.

runtimeverification / kontrol

Further optimisation of valid jump destination computation #602