mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

Simple miss-sliced examples? #384

Open FerranAlet opened 3 years ago

FerranAlet commented 3 years ago

Hi,

As far as I understand, I'm getting some miss-sliced examples. I'm unsure whether it's a bug or I'm violating a pre-condition of dg:

Example 1

Code

#include<string>
using namespace std;

int f(int A, int B) {
    int D = 100;
    int E = D+100;
    int F = E-D;
    int v[3];
    string s;
    s += "hello";
    v[1] = 4;
    B += v[1];
    B += 1;
    A += B - 1;
    B += 100;
    return A;
}

int main() {
    int A = 0, B = 1000;
    int C = f(A,B);
}

Command: dg/tools/dgtool llvm-slicer -Xclang -g -c 21:C -cda ntscd -annotate slice hello.cpp -statistics -dump-dg && dg/tools/llvm-to-source hello.sliced hello.cpp Result:

4: int f(int A, int B) {
10:     s += "hello";
11:     v[1] = 4;
12:     B += v[1];
13:     B += 1;
14:     A += B - 1;
16:     return A;
17: }
19: int main() {
20:     int A = 0, B = 1000;
21:     int C = f(A,B);
22: }

Problem: The code wouldn't compile (s and v are not declared) and s isn't relevant to A.

Example 2:

Code

#include<vector>
#include<string>
using namespace std;

int f(int A, int B) {
    int D = 100;
    int E = D+100;
    int F = E-D;
    vector<int> v(3);
    v[1] = 4;
    B += v[1];
    B += 1;
    A += B - 1;
    B += 100;
    return A;
}

int main() {
    int A = 0, B = 1000;
    int C = f(A,B);
}

Command dg/tools/dgtool llvm-slicer -Xclang -g -c 20:C -cda ntscd -annotate slice hello.cpp -statistics -dump-dg && dg/tools/llvm-to-source hello.sliced hello.cpp

Result

5: int f(int A, int B) {
6:     int D = 100;
7:     int E = D+100;
8:     int F = E-D;
9:     vector<int> v(3);
10:     v[1] = 4;
11:     B += v[1];
12:     B += 1;
13:     A += B - 1;
15:     return A;
16: }
18: int main() {
19:     int A = 0, B = 1000;
20:     int C = f(A,B);
21: }

Problem: lines 6,7,8 are superfluous. It's missing the #include and the using namespace (that's fine)

If I'm doing something wrong or running into problems with dg, is there a work-around? Thanks!

mchalupa commented 3 years ago

Problem: The code wouldn't compile (s and v are not declared) and s isn't relevant to A.

The code generated by llvm-to-source is not meant to be compilable. This tool just writes out the lines of the original program that are present also in the slice (or the other bitcode that is passed as the argument). The tool is for "overview", really. llvm-slicer should produce executable LLVM bitcode, though.

Problem: lines 6,7,8 are superfluous. It's missing the #include and the using namespace (that's fine)

The superfluous lines are there because DG does not support C++ and thus it over-approximates the behavior of the program. This over-approximation leads to bigger slices.

FerranAlet commented 3 years ago

Thanks @mchalupa for the quick response! Two follow-ups:

Thanks!

mchalupa commented 3 years ago

Sorry for the dumb question, but is there any other type of instruction beyond declaration that will not be picked up by llvm-to-source?

Probably yes, but hard to tell... maybe some global initialization? However, if you need a compilable C code, you can use a tool like rellic, llvm2c, or llvm-cbe to transform the sliced LLVM to C (I do not know how is it with the support of C++ in these tools).

Does the fact that C++ is not supported only result in over-approximations or the program can just fail to run as well?

In the presence of exceptions in the program or some functions that may abort inside libstdc++, the program may also fail running. In other cases, it should be just over-approximation -- but it is not tested at all.