Closed pd-fkie closed 3 years ago
Alright, after discussion, we are willing to drop support for Python 2.7! So, no need to try to maintain both implementations or anything.
Hi,
When testing this, I get the following errors on build:
atheris.cc:117:14: error: expected ';' after expression
py::module_ atheris = py::module_::import("sys").attr("modules")["atheris"];
^
;
atheris.cc:117:7: error: no member named 'module_' in namespace 'pybind11'
py::module_ atheris = py::module_::import("sys").attr("modules")["atheris"];
~~~~^
atheris.cc:117:15: error: unexpected namespace name 'atheris': expected expression
py::module_ atheris = py::module_::import("sys").attr("modules")["atheris"];
^
atheris.cc:117:29: error: no member named 'module_' in namespace 'pybind11'; did you mean 'module'?
py::module_ atheris = py::module_::import("sys").attr("modules")["atheris"];
~~~~^~~~~~~
module
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h:789:7: note: 'module' declared here
class module : public object {
^
atheris.cc:118:7: error: no type named 'module_' in namespace 'pybind11'; did you mean 'module'?
py::module_ core;
~~~~^~~~~~~
module
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h:789:7: note: 'module' declared here
class module : public object {
^
atheris.cc:121:16: error: no member named 'module_' in namespace 'pybind11'; did you mean 'module'?
core = py::module_::import("atheris.core_with_libfuzzer");
~~~~^~~~~~~
module
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h:789:7: note: 'module' declared here
class module : public object {
^
atheris.cc:123:16: error: no member named 'module_' in namespace 'pybind11'; did you mean 'module'?
core = py::module_::import("atheris.core_without_libfuzzer");
~~~~^~~~~~~
module
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h:789:7: note: 'module' declared here
class module : public object {
^
atheris.cc:126:3: error: unexpected namespace name 'atheris': expected expression
atheris.attr("_trace_cmp") = core.attr("_trace_cmp");
^
atheris.cc:127:3: error: unexpected namespace name 'atheris': expected expression
atheris.attr("_reserve_counters") = core.attr("_reserve_counters");
^
atheris.cc:128:3: error: unexpected namespace name 'atheris': expected expression
atheris.attr("_trace_branch") = core.attr("_trace_branch");
^
In file included from atheris.cc:16:
In file included from ./atheris.h:29:
In file included from /usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/functional.h:12:
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h:1093:9: warning: expression result unused [-Wunused-value]
PYBIND11_EXPAND_SIDE_EFFECTS(add_base<options>(record));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include/pybind11/detail/common.h:662:47: note: expanded from macro 'PYBIND11_EXPAND_SIDE_EFFECTS'
#define PYBIND11_EXPAND_SIDE_EFFECTS(PATTERN) pybind11::detail::expand_side_effects{ ((PATTERN), void(), false)..., false }
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
atheris.cc:144:3: note: in instantiation of function template specialization 'pybind11::class_<atheris::FuzzedDataProvider>::class_<>' requested here
py::class_<FuzzedDataProvider>(m, "FuzzedDataProvider")
^
1 warning and 10 errors generated.
error: command 'clang' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for atheris
That's with clang, gcc produces similar but slightly different errors.
The exact command invoked by setup.py was:
clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DVERSION_INFO='1.0.11' -DATHERIS_MODULE_NAME=atheris -I/usr/local/google/home/ipudney/.local/lib/python3.8/site-packages/pybind11/include -I/usr/local/include/python3.8 -c atheris.cc -o build/temp.linux-x86_64-3.8/atheris.o -std=c++14
Should be fixed now.
Thanks! Still one error that can be fixed with a cast
:
atheris.cc: In function ‘void atheris::Fuzz()’:
atheris.cc:131:75: error: conversion from ‘pybind11::detail::item_accessor’ {aka ‘pybind11::detail::accessor<pybind11::detail::accessor_policies::generic_item>’} to non-scalar type ‘pybind11::module’ requested
131 | py::module atheris = py::module::import("sys").attr("modules")["atheris"];
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
Sorry but I could not reproduce the latest error. Did the commit fix it?
It did!
Everything seems to be working fine. Merging!
There's a couple small things, mostly to do with Google style, I want to change; but I'll avoid the back-and-forth and just make those changes myself. I'll make those today and then release the new version.
Hey, I came across a few questions while working on this.
Tell me about the reason you replace the 3 instrumentation functions, rather than just change their behavior, after fuzzing has started. This is leading to a few problems. The first is that inside of Google we link all extensions together, so this fails with our build system. That's something I can easily fix by giving the the before-fuzz and after-fuzz functions different names. But the second problem is that since coverage isn't being collected until Fuzz() is called, libFuzzer will get mad if the first couple fuzz attempts don't produce coverage. Do you think it would be reasonable to (a) only change the behavior of _reserve_counters
, and (b) do that via a global in C++, not overwriting the module member?
Unfortunately, some additional issues came up during integration into Google that require review. Should hopefully be done tomorrow.
Tell me about the reason you replace the 3 instrumentation functions
My reasoning was that if we want to support instrumentation of lazy imports some day then _reserve_counters
needs to call __sanitizer_cov_8bit_counters_init
and __sanitizer_cov_pcs_init
and thus needs to be in core.cc.
This also applies to _trace_branch
obviously.
Do you think it would be reasonable to (a) only change the behavior of _reserve_counters, and (b) do that via a global in C++, not overwriting the module member?
This is absolutely possible for _reserve_counters
and _trace_branch
. atheris.cc can manage the counter region just like core.cc and pass its address and size to start_fuzzing
. This way a switch between before-Fuzz and after-Fuzz isn't even necessary. But I'm afraid there is no way to switch behaviour inside of a single _trace_cmp
. It still has to be supplied by multiple submodules.
This drops the support for lazy imports as described above though.
since coverage isn't being collected until Fuzz() is called, libFuzzer will get mad if the first couple fuzz attempts don't produce coverage
I'm not quite sure what you mean. How can it make fuzz attempts before calling LLVMFuzzerRunDriver
?
I'm not quite sure what you mean. How can it make fuzz attempts before calling LLVMFuzzerRunDriver?
You can't, but you can still collect coverage. When libFuzzer starts, it runs a couple fuzz attempts with inputs like the empty string, and if those don't produce coverage, libFuzzer exits. This can happen if the fuzzer driver is a bit more complicated and doesn't call into the instrumented libraries on the first couple inputs.
If coverage was collected before fuzzing started, users could easily resolve this by e.g. calling my_lib.do_thing()
before the first run. But it might not be a big deal. I can leave coverage collection disabled before fuzzing starts for now.
Could you provide an example fuzzing target where libfuzzer complains about missing coverage? I would like to get a better understanding of this since it never occured in my fuzzing attempts.
Also to be completely honest this merge was a little bit premature.
There is still a lot of room for improvements:
I had plans to fix 1., 2. and 3. before you merged this (I was not aware of 4. though).
Do you plan to fix these now yourself? I would be perfectly fine with that.
Could you provide an example fuzzing target where libfuzzer complains about missing coverage?
Hmmm, it looks like I can't reproduce this in regular Python, only in Google's weird Python setup. So nevermind :)
There is still a lot of room for improvements: Do you plan to fix these now yourself? I would be perfectly fine with that.
Sure, I'll take on these. I don't know if #2 will be finished before we cut a release, but I'll get #1 and #3 done. Regarding the package structure: I moved all of the Python up to the root directory to maximize compatibility with existing fuzzers inside of Google, but I'll change it to a much more sensible structure soon.
I'm also writing tests. One huge advantage of this instrumentation: testing is way easier! Now, it's practical to test the tracing code.
I also made a couple changes to improve compatibility and clarity. import atheris_no_libfuzzer
now prints a warning (but is otherwise an alias for import atheris
). The atheris.instrument()
function is renamed to atheris.instrument_imports()
to make it more obvious that it doesn't work on individual functions, just whole modules. And finally, Atheris will now automatically determine whether to use the version with or without libFuzzer, the user doesn't need to specify.
This pull request adds functionality to instrument modules at runtime
for coverage collection and dataflow tracing.
It enables atheris to get rid of
sys.settrace
and its huge runtime overheadto double the execution speed of the fuzzer (see issue #16).
Changes from a user perspective
A new function was added to atheris called
atheris.Instrument()
that installsa temporary import hook. This hook instruments the underlying code objects of modules at import-time. The function has to be used as follows:
This will cause the
target_library
to get instrumented. Every other library not imported inside thewith
-block will not get instrumented.It is possible to filter which modules get instrumented by supplying a whitelist of module names to
atheris.Instrument()
like this:This may be necessary to stop instrumentation of modules in the python standard library
(e.g. if
target_library_a
importsstruct
, thenstruct
would also get instrumented).For the sake of simplicity this filter is optional.
Changes in the code
atheris
changed from being a single extension to being a package of the following structure:The same applies to
atheris_no_libfuzzer
.atheris/import_hook.py
atheris.Instrument()
and the import hook are defined hereatheris/instrument_bytecode()
provides the functionpatch_code(code_object)
and the classInstrumentor
, which does the heavy lifting of the instrumentation procedure.atheris/version_dependent.py
contains version-specific behaviour and data.Supported python versions
Support for version 3.5 was dropped.
About the bytecode instrumentation
Each python module consists of a hierachy of code objects. At the top level is the code object
for the module itself and below are code objects for classes, functions, lambdas, etc.
atheris
goes through each code object and builds a CFG of the bytecode.If a basic block has two outgoing edges, a function invocation of
atheris._loc()
getsinserted at both branch ends. The argument of
atheris._loc()
is an id of the branch that was taken.After all code objects have been processed the number of overall instrumented branches in the module is known and a call to
atheris._reg(num_instrumented)
gets inserted at the very beginning of the module.While all target modules get imported
atheris
collects all calls toatheris._reg()
and stores the overall number of counters needed for all modules.In
atheris.Fuzz()
a memory-region of an appropriate size is allocated and used as a region for the counters.atheris._loc()
tellsatheris
at which index to increment a counter in the counter region.In order to trace the dataflow each
COMPARE_OP
gets replaced by a call toatheris._cmp()
.