EqBench is a collection of equivalent and non-equivalent Java and C programs with the goal of advancing software engineering research. It unifies and extends benchmarks used in earlier works contributing 147 equivalent and 125 non-equivalent cases, which we capture in both C and Java, to facilitate applicability for different equivalence checking techniques.
If you use this dataset, please cite the paper below:
Badihi S, Li Y, Rubin J. "EqBench: A Dataset of Equivalent and Non-equivalent Program Pairs", IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), data showcase, pp. 610-614, 2021.
The structure of the dataset is shown below:
The archive is organized by 18 major directories corresponding to 18 benchmarks. Inside each benchmark directory, we have one folder for each program. Since we have both equivalent and non-equivalent pairs for each program, we provide two sub-folders within each program directory, i.e., Eq and Neq. Inside each of the program pair sub-directories, there are four versions: two written in Java (oldV.java and newV.java) and two written in C (oldV.c and newV.c).
For each Eq and Neq versions, we also provide two files that contain the meta-data describing the program, in JSON format. The files are named C-Desc.json and J-Desc.json and correspond to C and Java versions of the program, respectively.
The file is structured as follows.
The below figure shows an example of the meta-data for the sign program in the airy benchmark:
In addition, to ensure consistency of the dataset, we provide a schema template (EqDescTemplate.json and NeqDescTemplate.json) describing the required meta-data for each benchmark as a partof the dataset.