Open MarekKnapek opened 5 months ago
Thanks! What would you suggest as a way to do that? Set up a manually invoked GitHub Action or similar that can be invoked from time to time, which successively invokes cppfront with fuzzed inputs and at the end opens one issue containing the list of all inputs that caused crashes?
I have multiple ideas. In no particular order:
cppfront
to suppress all on screen output. The output could be quite noisy and, during fuzzing, the output is not very useful. Something like cppfront /quiet test.cpp2
. It could be -q
, --quiet
, /quiet
or something similar. Or, it could be environment variable, or compile-time option.cppfront
source code to make fuzzing easier. Most importantly, do not write anything to disk, do not read from disk. I believe this is better for in-process fuzzing style the libFuzzer
provides. The situation with AFL
could be different tho. Basically convert cppfront
from an application to a library, then build two applications from this library, one is the cppfront
itself, the other is a fuzzer. The library would accept inputs and outputs as run-time or compile-time types. In cppfont
mode, the inputs and outputs would be files on disk and command line parameters. In fuzz mode, the inputs and outputs would be memory buffers containing the input sources, options and place for an output.libFuzzer
has an option to accumulate something called a corpus
over time. It would be nice not to lose this corpus
and maintain it over time. Maybe force-committing it periodically to separate branch?libFuzzer
provides random buffer of bytes and it is up to the application what it does with it, I choose to shove it to cppfront
as input source file. It would be nice to identify various separate independent components of cppfront
, "deserialize" this random buffer to something meaningful for each component, execute that component with that input and watch for undefined behavior, use after free, out of bounds access, assert and other bugs to trigger.cppfront
is fuzz-tested, that it contains no bugs when processing random or malformed input, the next stage would be verifying cppfont
that it produces valid output. Meaning fot any input, it produces not only no UB, assert, out of bounds access, but also it produces an error message or valid C++ output. It never produces invalid C++ as its output. This would be verified by running already installed compiler on cppfront
's output and testing the compiler's exit code. But I believe this would be very slow without custom mutator. Mutators is separate whole new can of worms. What mutator does is that it parses input (from corpus, here the corpus might not be random, but series of valid cpp2 source files) to its own internal representation, somehow modifies this internal representation, writes this internal representation back to bytes, then exercises the fuzz target as usual. This method is more difficult for fuzz test author, but yields better fuzz speed and coverage than supplying random bytes as fuzz target input.Thanks for the ideas.
Re /quiet
: This was added recently, with the semantics that only error output is printed. If cppfront crashes before the final stage of emitting errors, nothing will be emitted.
From #1163, thanks @MarekKnapek !
Step 1. Find a spare computer that could be left running 24/7. Step 2. Download my branch. Step 3. Run bash script from my branch. Repeat steps 1-3 for as many CPUs you have on your computer or for as many computers you have. Step 4. Come approximately once per day and check for crashes (ASAN detections).
The step 1 is the most difficult for me. And for protentional PR. I don't think GitHub Actions would let me run arbitrary code for 24/7. That would be similar to crypto mining.
The branch is located here https://github.com/MarekKnapek/cppfront/commits/fuzz3/ it contains three bash scripts. All of them are essentially one-liners. First one is "build script", one-liner that invokes compiler with ASAN enabled. Second one is "minimize corpus", it will run the cppfront
on each file in corpus, deleting any inputs that trigger already explored branches by previous inputs. And the last one is "start fuzzing" one-liner, it will run the compiled binary and collect corpus into corpus directory.
For step 1, I think there are some initiatives that provide support to setup fuzzing for open source projects, dunno if those could help, I was thinking along the lines of oss-fuzz and such. I have a spare Raspberry Pi 3B I could leave running 24/7 but I am not sure if that could be used or if it would even be good considering how "weak" it is. VPSs are also pretty cheap at like 5$ per month in some instances. There are plenty of options if you ask me!
VPSs are also pretty cheap at like 5$ per month in some instances.
Yes, I'm running this on Hetzner 2CPU 4GB RAM computer for 24/7, the cost is around 5.90 € per month including all taxes.
Issues found by fuzzing so far:
I'm using this code to fuzz: https://github.com/MarekKnapek/cppfront/commits/fuzz3/ it could be improved, but i don't know how.