Open kirushik opened 6 years ago
I think we can do significantly better
Agreed :+1:
Without knowing the internals, libFuzzer (cargo-fuzz) gets code coverage by instrumenting code. ~Bet there are some really good ideas in there we could use to implement a coverage tool.~ There are a couple tools that use LLVM's coverage tooling with cargo builds (see comment below). Will start researching how libFuzzer achieves its coverage analysis.
Found this: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
libFuzzer uses SanitizerCoverage
and gcov
to instrument the binary. Both of those tools work on the AST of the instrumented program, so using @maciejhirsz's Lunarity tool will be super useful here.
Also, here is a really great blog post from @guidovranken on his work improving libFuzzer with different guiding techniques. Not directly applicable, but may prove helpful.
Found a couple more techniques / tools for generating code coverage on Rust:
Both those tools are from recent comments from this Rust RFC issue about generating gcov
coverage data: https://github.com/rust-lang/rfcs/issues/646
We may be able to use these tools for coverage data over Rust code generated by ethabi-derive
.
Also from the RFC thread, this cool write-up by @whitequark on generating branch coverage reports: https://users.rust-lang.org/t/howto-generating-a-branch-coverage-report/8524
@lght is generating a coverage on the generated-rust level even worth it? It's just enough to extract Solidity-level coverage from tracing bytecode execution on the VM level, we really don't need to mix our EthABI/EVM implementation details into that.
The way of outputting the standarts-compatible coverage data is indeed something we can borrow from those implementations.
The way of outputting the standarts-compatible coverage data is indeed something we can borrow from those implementations.
This sounds like a good approach. After initial tests with gcov, and thinking about coverage in general, it makes more sense to follow your suggestion to restrict scope to Solidity coverage.
Will look into generating coverage reports from the VM traces.
my hope is that the parity EVM gives us enough tracing/introspection/statistics to achieve this. possibly through some slight modification. and in combination with source maps. if it works it would be a clean approach. also for https://github.com/paritytech/sol-rs/issues/11 (show failing line) and similar things. though i still have to look into all that
through a vm tracer one has access to the pc
(program counter / instruction pointer):
https://github.com/paritytech/parity/blob/98b7c07171cd320f32877dfa5aa528f585dc9a72/ethcore/src/trace/mod.rs#L107
through a (non-vm) tracer one has access to the code address and code: https://github.com/paritytech/parity/blob/98b7c07171cd320f32877dfa5aa528f585dc9a72/ethcore/src/trace/mod.rs#L52
by combining a tracer and vm-tracer it's easy to sum the number of hits to the same (instruction pointer, code address) pair.
(both tracers would have to mutate some shared state to achieve this, since we need to detect through the tracer that a call into specific code is being made and then, just through the fact that they follow in sequence on the vm-tracer can associate the instruction pointers with the code)
(i'll probably implement a combine
tracers function that wraps two tracers so things like "code coverage", "last instruction", etc can run in parallel and independently)
(should be useful to use boxed tracers so they can be added/disabled dynamically. we could start with this as the default and then only if it's too slow in practice (which i doubt) add an option to set tracers statically)
having the code address -> instruction pointer -> hit count
mapping and another mapping (unsure how to build this elegantly, requires access to source code whenever code is deployed) of code address -> source
as well as sourcemaps source -> instruction pointer -> line number
we can then do code coverage.
solc source maps unfortunately seem a bit broken in my early experiments (getting just one section that comprises the entire contract). i'll investigate further.
solc sourcemaps are documented here: https://github.com/ethereum/solidity/blob/develop/docs/miscellaneous.rst#source-mappings
every test could potentially output such coverage information. i guess it makes sense to write it all out into a folder and then in a separate step combine into some standard code coverage format
we don't need sourcemaps if we just want a percentage. that could be a first step
There's https://github.com/sc-forks/solidity-coverage project, which inserts even emission to track the coverage. I think we can do significantly better (especially around pure functions, which can't emit events)!