trailofbits / ruzzy

A coverage-guided fuzzer for pure Ruby code and Ruby C extensions
GNU Affero General Public License v3.0
78 stars 5 forks source link

Request Ruby C API make hooking coverage events publicly available #9

Open mschwager opened 7 months ago

mschwager commented 7 months ago

Ruzzy implements libFuzzer's SanitizerCoverage to achieve coverage-guided fuzzing of Ruby code.

It achieves this via three of SanitizerCoverage's features:

  1. Inline 8-bit counters
  2. PC-Table
  3. Tracing data flow

To implement counters and the pc-table, Ruzzy hooks coverage events similar to Ruby's builtin Coverage module. It's great that this functionality is built into Ruby and we don't have to modify the Ruby bytecode during execution, or use some other heavy-handed means of tracking coverage. However, coverage instrumentation is not currently part of Ruby's public C API. To gather coverage instrumentation, Ruzzy has to perform two hacky actions:

Hook the internal-only RUBY_EVENT_COVERAGE_BRANCH event:

https://github.com/trailofbits/ruzzy/blob/5e399440559ec397a008ae271c0c78edd628cef5/ext/cruzzy/cruzzy.c#L12-L16

Call the Ruby interface to the Coverage module to enable coverage tracking, rather than a public C API:

https://github.com/trailofbits/ruzzy/blob/5e399440559ec397a008ae271c0c78edd628cef5/ext/cruzzy/cruzzy.c#L195-L211

As the comment mentions, we must call Coverage.setup, which calls rb_set_coverages. If this is not called, then rb_get_coverages will return NULL in iseq.c, and coverage tracking will not be enabled.

To simplify Ruzzy's coverage tracking, I would request that both of these pieces of functionality be made a part of Ruby's public C interface. Particularly, the RUBY_EVENT_COVERAGE_BRANCH event, so we can hook coverage events with the standard rb_add_event_hook function. And a means to initialize Ruby's global coverage state, similar to rb_set_coverages, so a C extension can enable and gather coverage information. It would be very helpful if coverage events like lines and branches were added to the TracePoint API, both for Ruby and for C extensions.

An additional nice to have would be extending the public tracearg functionality to include additional coverage information. Currently, tracearg offers information like rb_tracearg_lineno and rb_tracearg_path. It would be helpful if it also provided additional coverage information like coverage.c's column information and a unique identifier for each branch. Currently, Ruzzy has to use (path, lineno) as a unique identifier for a branch because that's what's offered by the public API, but more information would be helpful for uniquely identify branches.

mschwager commented 6 months ago

I think making this functionality public and officially supported would help too: https://github.com/ruby/ruby/blob/v3_3_0/include/ruby/debug.h#L774-L792.

mschwager commented 5 months ago

Process here: https://docs.ruby-lang.org/en/3.3/contributing/reporting_issues_md.html#label-Requesting+features

mschwager commented 5 months ago

Upstream feature request here: https://bugs.ruby-lang.org/issues/20448