onflow / cadence

Cadence, the resource-oriented smart contract programming language πŸƒβ€β™‚οΈ
https://developers.flow.com/cadence
Apache License 2.0
531 stars 139 forks source link

Investigate deeper fuzzing #975

Closed turbolent closed 2 years ago

turbolent commented 3 years ago

Issue To Be Solved

We currently support fuzzing the parser and checker using go-fuzz. This helped during the development of the new parser last year.

Extend the fuzzing support to get deeper coverage and also support fuzzing the interpreter.

Currently, input is random, so the coverage for the checker is small, as only few inputs are valid syntactically. By providing syntactically, and further also semantically correct inputs, the coverage for the checker and interpreter can be improved.

Suggested Solution

Stages

  1. Set up CI for the current fuzzing of the parser/checker
  2. Generate syntactically valid programs to get deeper coverage of the checker (parser won't reject)
  3. Generate semantically valid programs to get deeper coverage of the interpreter (checker won't reject)

Inspiration

bluesign commented 3 years ago

Also native fuzzing for golang[0] can be useful I guess

[0] https://blog.golang.org/fuzz-beta

fxamacker commented 3 years ago

Maybe we can use https://github.com/thepudds/fzgo until Go's proposed fuzzer is released. It's basically a prototype of Go's proposed fuzzer but it's already being used by some non-hobby projects. I might use it to stress test storage PoC.

I'd love to have custom mutators as an option in go-fuzz. :laughing: I'd also love for go-fuzz to display cover as a percent during fuzzing.

A slightly customized version of https://github.com/dvyukov/go-fuzz/commit/fca39067bc7270ea00dd1a7ce4443eba66ff58fe was used to test fxamacker/cbor v2.3.0 release last week and I wish go-fuzz had the missing features.

Curated Lists of Fuzzing Resources

turbolent commented 3 years ago

@bluesign wow, what are the odds! I had seen the GitHub issue tracking the development of that, and now they finally announced it just in time πŸ™‚ We should definitely check that out!

@fxamacker Oh I hadn't seen this, thanks for sharing! Yes, we should also check that out and see how we can get good mutation for out use-case πŸ˜„ πŸ’―

robert-e-davidson3 commented 2 years ago

Discussed intent and scope with @turbolent and @j1010001. Here's the summary:


The goal of fuzzing is to quickly increase test coverage. Concretely this means fuzzing can run for a long time (say, a week) without finding any new bugs.

Our original fuzzing was quick and dirty: feed some bytes to the parser and fail on panics. The search space of this approach is so broad that it cannot run fast enough to provide much test coverage - almost every input it gives is syntactically invalid. This code lives in onflow/cadence and can be ignored.

We needed a grammar-based fuzzer to cut down the search space to just syntactically-valid inputs. The first idea was to write Cadence's grammar into the EBNF format so we could leverage existing tools. Unfortunately Cadence doesn't fit into EBNF cleanly enough to satisfy those tools.

Joe had a solution: modify the Cadence parser so that, at any point, it can give a list of valid tokens that can follow. This "parser-based fuzzer" is an implicit grammar-based fuzzer. That code lives in onflow/fuzzer as an old fork of onflow-cadence and was run on FuzzBuzz.io.

The next step is to evaluate the status of the onflow/fuzzer code and how it integrates with FuzzBuzz.io. Once that's established, rebase the latest Cadence code from onflow/cadence:master to onflow/fuzzer:master and make adjustments so the fuzzer still works.

That might be sufficient. If it isn't then we need to investigate alternatives such as go-fuzz or the built-in fuzzing support in Go 1.18. The approach here is to use mainnet contracts and transactions as the seed corpus. That will require maintenance because as the Cadence language is developed and so diverges from the deployed contracts.

If that isn't sufficient then we need to investigate writing and integrating our own mutation algorithm. Most likely this would merge both prior methods, mutating existing contracts and transactions randomly but based on the existing contracts.

j1010001 commented 2 years ago

Hey team! Please add your planning poker estimate with ZenHub @dsainati1 @robert-e-davidson3 @SupunS @turbolent

robert-e-davidson3 commented 2 years ago

Deprecating this ticket in favor of epic https://github.com/onflow/cadence/issues/1309.