mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

Incorrect building sdg #424

Open jiachunpeng opened 2 years ago

jiachunpeng commented 2 years ago

envirment

clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/9
Candidate multilib: .;@m64
Selected multilib: .;@m64

command

clang -emit-llvm -g test.c -S
~/work/dg/build/tools/llvm-sdg-dump test.bc

code 1

#include <stdlib.h>
void *mymalloc(void *c, int size) {
  c = malloc(size);
  return malloc(size);
}
int main() {
  void *a;
  void *c;
  int b = 5;
  a = mymalloc(c, b);
}

image code 2

#include <stdlib.h>
void *mymalloc() {
  return malloc(1);
}
int main() {
  void *a;
  a = mymalloc();
}

image

mchalupa commented 2 years ago

Hi,

the SDG class is a replacement for DG class, but is not finished (nor used by any component of DG) yet. Is this problem present if you use llvm-dg-dump?

jiachunpeng commented 2 years ago

Hi,

the SDG class is a replacement for DG class, but is not finished (nor used by any component of DG) yet. Is this problem present if you use llvm-dg-dump?

I still have some questions.

#include <stdlib.h>

void *mymalloc() { 
  void *ret = malloc(5);
  return ret; 
}

int main() {
  void *a;
  a = mymalloc();
  free(a);
}

command

clang -emit-llvm -g test.c -S
~/work/dg/build/tools/llvm-dg-dump test.bc --no-cfg > test.dot

image

Should there has a data dependence edge from return ret to a = mymalloc() ? And why there has a control dependence edge form a = mymalloc() to *ret = malloc(5) ?

mchalupa commented 2 years ago

return ret does not modify anything, so it should have no data dependence edges. The control dep. edge between main:: call mymalloc and OUT ARG malloc(5) is there because the latter is an (output) actual argument of the former. In this case, these nodes are redundant as the memory allocated by malloc(5) is not written to, but they are being created for a simpler generation of the graph.

jiachunpeng commented 2 years ago

return ret does not modify anything, so it should have no data dependence edges. The control dep. edge between main:: call mymalloc and OUT ARG malloc(5) is there because the latter is an (output) actual argument of the former. In this case, these nodes are redundant as the memory allocated by malloc(5) is not written to, but they are being created for a simpler generation of the graph.

Thanks for reply!

#include <stdlib.h>

void *mymalloc() { 
  void *ret = malloc(5);
  return ret; 
}

int main() {
  void *a;
  a = mymalloc();
  free(a);
}

For the same code. the first is builded acording to Reps T . Program analysis via graph reachability[J]. Information & Software Technology, 1998, 40(11-12):701-726., the seconed is form llvm-dg-dump.

Is there any difference between the two methods? image

image

mchalupa commented 2 years ago

Is there any difference between the two methods?

Probably? I do not know.

is it correct?

Correct in what sense? W.r.t the reachability analysis and backward slicing algorithms? Then yes, it is correct but not needed. These nodes have no influence on backward slicing and can be prune away. Again, they are present only to ease the building of DG.

why formal out of mymalloc is %2 = malloc(5) instead of ret %3

Because the output nodes represent the memory written inside the call. ret %3 does not write to memory.

jiachunpeng commented 2 years ago

For this code

int getValue();
int doSomething(int i);

int fun() {
  int v = 5;
  int c = getValue();
  c += 4;
  v += 4;
  return c;
}

int main() {
  int a;
  a = fun();
  doSomething(a);
}

Using command

llvm-slicer test.bc -c doSomething --dump-dg-only

The result is image The result of slicing is correct. But why there is no dependency between c = getValue() and doSomething(a) in graph.

mchalupa commented 2 years ago

There is a transitive dependency, otherwise the slice would not contain c = getValue(). Or what kind of dependency you mean?

jiachunpeng commented 2 years ago

I mean the dependency edges in the graph dumped by the llvm-slicer. Or, the llvm-slicer does not dump some dependency edges using the default options ?

mchalupa commented 2 years ago

Looking at the graph, it seems that there is missing the edge from NODE0x55d4...720 to %2 = call i32 foo() if that is what you mean.

mchalupa commented 2 years ago

It is in the graph, but it is not displayed for some reason.