To facilitate reproducible research on testing and debugging, researchers use curated benchmarks of bugs:
the Siemens benchmark
ManyBugs is a benchmark of 185 C bugs in nine open-source programs.
Defects4J is a benchmark of 341 Java bugs from 5 open-source projects. It contains the corresponding patches, which cover a variety of patch type.
BEARS is a benchmark of continuous integration build failures focusing on test failures. It has been created by monitoring builds from open-source projects on Travis CI.
Benchmark of bugs
To facilitate reproducible research on testing and debugging, researchers use curated benchmarks of bugs: