A fast and accurate CFG generator for EVM bytecode using symbolic stack analysis. View extra graph visuals below.
First, make sure rust is installed:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Then:
git clone https://github.com/plotchy/evm-cfg
cd evm-cfg/
cargo install --path . --locked
evm-cfg <PATH_OR_BYTECODE> -o cfg.dot --open
For working with .dot files, I recommend using the below VSCode extension as it allows powerful graph searching and filtering:
Name: Graphviz Interactive Preview
Id: tintinweb.graphviz-interactive-preview
Description: Graphviz (dot) Interactive Preview
Version: 0.3.5
Publisher: tintinweb
VS Marketplace Link: https://marketplace.visualstudio.com/items?itemName=tintinweb.graphviz-interactive-preview
For minimal dependencies, you can convert the .dot file to a .svg picture for viewing:
evm-cfg ./examples/weth9.evm -o ./examples/weth9.svg --open
# which is equivalent to:
evm-cfg ./examples/weth9.evm -o ./examples/cfg_weth.dot;
dot -Tsvg ./examples/cfg_weth.dot -o examples/weth9.svg;
open examples/weth9.svg;
To generate bytecode from a given solidity file, you can use the following:
solc <path_to_file> --bin-runtime --no-cbor-metadata --optimize --via-ir
Using static analysis, we can split the bytecode up into a structure representable by a graph. The following is done before executing any opcodes through a vm:
Jumps and Jumpis with direct push values are connected to their jumpdests. False sides of Jumpis are connected to the next instruction.
Any nodes without incoming edges that also don't begin with a JUMPDEST are impossible to enter, and are removed. This is done to reduce clutter
Essentially, the only goal left is to connect more edges when jumps are "indirect". Depending on the path the traverser used when coming into the instruction block, it may jump to separate locations. (ie: one traverser may jump to a push it carried over from block A, while another traverser may jump to a push it carried over from block B)
Finding these indirect jumps is difficult. Normally, DFS traversers can prevent revisiting previously seen nodes by using a set, but in our case, revisiting a node may be required to carry over a separate push value that jumps to a new location. When we consider allowing revisiting nodes loops immediately become a problem.
The following methods are used to prevent loops from going infinite:
This method is definitely not bulletproof, and there are cases where it will fail to quit out of loops. For this implementation, I only track stack values to 128 depth (max is 1024), and only up to 10 values at once. Both of these metrics are easily adjusted, but I have not ran into any cases where it was necessary to increase them.
First, some timings. This tool is fast! | ||
---|---|---|
Contract | Time to Generate CFG | |
April CTF | 0.620 ms | |
WETH9 | 0.778 ms | |
Fun Reversing Challenge | 1.64 ms | |
UniswapV2Router | 10.85 ms |
Here are some examples of the graph visualizations:
WETH9 on mainnet:
Fun Reversing Challenge from paradigmctf2022:
April by @brockelmore:
Before opening pull requests consider formatting and linting your code according to:
cargo fmt -- --check
cargo +nightly clippy --all --all-features -- -D warnings