Update Architecture- Generate Bugs on Demand

I think LAVA would be more usable if it worked as follows:

Using a config file, lava identifies ATPs and DUAs in a target with a given input file and CLI argument string. We store these in a database, mapping ATPs and DUAs to input file(s) (with corresponding hashes) and CLI args that will lead to these DUAs and ATPs. Running subsequent runs of LAVA on the same input file will do nothing at this step (all DUAs/ATPs should already be found). Running on new input files will append to this database.
When it's time to inject bugs, we combine ATPs and viable DUAs to generate potential bugs. Testing and pruning potential bugs would continue as normal. An option would ensure that, if desired, all injected bugs can be triggered by the same CLI arguments.

The benefits to this are:

Smaller database and major performance improvements
Easier to update an existing database with new DUAs/ATPs as discovered by subsequent input files.
Enables more complex bug-generation logic where we could try combining DUAs and ATPs from different input files (though it may be difficult/impossible to generation solutions to these)

This differs from the current architecture in the following ways:

Currently we create bugs in find_bug_injections.cpp as we identify ATPs and DUAs during analysis of a recording. Due to poor database design/configuration, the bulk of this time is spent inserting bugs into the database.
While the current system supports conducting taint analysis on multiple input files, it's not easy to add a single new input file to an already built database of DUAs and ATPs.

The major work required here will be to update the database schema, and to split FBI's logic into a ATP/DUA generation phase and then, later, a phase where bugs are "built" out of DUAs and ATPs.

panda-re / lava

Update Architecture- Generate Bugs on Demand #25