mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

How can I slice an arbitrary function? #426

Open shouguoyang opened 2 years ago

shouguoyang commented 2 years ago

If the criterion locates a non-main function, e.g. fact function below:

#include <stdio.h>

long int fact(int x)
{
        long int r = x;
        while (--x >=2)
                r *= x;

        return r;
}

int main(void)
{
        int a, b, c = 7;

        while (scanf("%d", &a) > 0) {
                assert(a > 0);
                printf("fact: %lu\n", fact(a));
        }

        return 0;
}

Compile it with clang-10 -S -emit-llvm -g -o fact.bc fact.c or clang-10 -c -emit-llvm -g -o fact.bc fact.c. Slicing by ./llvm-slicer -entry fact -c 8:r fact.bc, the result shows

No reachable slicing criteria: '' '8:r'
[llvm-slicer] saving sliced module to: fact.sliced

How can I slice with the variable r at line 8?

mchalupa commented 2 years ago

Did you try to use -sc instead of -c? (-c is an older switch that is meant to get a name of a function). And I do not see any use of r at line 8. Did you mean line 9?

mchalupa commented 2 years ago

Ok, found the problem. There is a bug in the reachability slicer that runs before the main slicer. You can workaround it by using -cuttof-diverging=false. I.e., these commands should work: llvm-slicer -c '9:r' -cutoff-diverging=false -entry fact fact.ll or llvm-slicer -sc '9#r' -cutoff-diverging=false -entry fact fact.ll

shouguoyang commented 2 years ago

Thanks for your answer. I have some questions:

  1. why the results of forward slicing (e.g., ./llvm-slicer -sc "7#r" -cutoff-diverging=false -entry fact -forward fact.bc ) seems contain the code generated by backward slicing?
    • Forward slicing result:
      5: {
      6:      long int r = x;
      7:      while (--x >=2)
      8:              r *= x;
      10:     return r;
      11: }
  2. It appears that the results of backward slicing include code lines following the code line in criteria.
  3. If I specify a variable in criteria, let's say length at line N, at the same time variable start also at line N but not in criteria, the result code of slice seems to contain code lines that have data dependency on start. How can I just get the code that only have dependency on variable length precisely?

I would appreciate it if you could answer the questions above.

Btw, it would be more convenient if you could open a slack channel for this repo.

mchalupa commented 2 years ago
  1. why the results of forward slicing (e.g., ./llvm-slicer -sc "7#r" -cutoff-diverging=false -entry fact -forward fact.bc ) seems contain the code generated by backward slicing?

Because it does. It is to make the slice executable and LLVM code well-defined.

  1. It appears that the results of backward slicing include code lines following the code line in criteria.

Any example?

  1. If I specify a variable in criteria, let's say length at line N, at the same time variable start also at line N but not in criteria, the result code of slice seems to contain code lines that have data dependency on start. How can I just get the code that only have dependency on variable length precisely?

Again it would be nice to have a concrete example. Without it, I cannot tell where is the problem.

mchalupa commented 2 years ago

I moved the bug that started this conversation to #427 .

shouguoyang commented 2 years ago

Let's say we get the code file fact.c below:


#include <assert.h>
#include <stdio.h>
int arr[10];
long int fact(int s, int e)
{
int len = 10;
if (s-e > len){
return 0;
}
    if (s+e > 20){
            return arr[s] - arr[e];
    }

    return arr[len - s];

}

int main(void) { int a, b, c = 7;

    while (scanf("%d", &a) > 0) {
            assert(a > 0);
            printf("fact: %lu\n", fact(a, 1));
    }

    return 0;

}


I compiled the code by `clang -emit-llvm -S -g -fno-discard-value-names fact.c -o fact.c.ll`.
Then I did the slice by `llvm-slicer -entry fact -sc "7#len" -cutoff-diverging=false -forward fact.c.ll`.
I got the sliced code below:

5: { 6: int len = 10; 7: if (s-e > len){ 8: return 0; 9: } 11: if (s+e > 20){ 12: return arr[s] - arr[e]; 13: } 16: return arr[len - s]; 18: }



For the question 3, I specify the variable `len` in line 7 but get the code from lines 11 to 13, which has no control and data dependency with the criteria. So, could you explain why the code located at lines 11 to 13 is sliced? Is it possible to remove such code since it has no data or control dependency?

Regarding question 2, the slicer works well after specifying -cutoff-diverging=false* during the slicing process.

Thank you for your time.
mchalupa commented 2 years ago

For the question 3, I specify the variable len in line 7 but get the code from lines 11 to 13, which has no control and data dependency with the criteria. So, could you explain why the code located at lines 11 to 13 is sliced?

There is control dependence from line 7 to 11 and from 11 to 16. Line 11 (16, resp.) is not executed if the condition on line 7 (11, resp.) evaluates to true.

shouguoyang commented 2 years ago

Thanks.