mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

the results of slicing seems wrong #435

Open freexxxyyy opened 2 years ago

freexxxyyy commented 2 years ago

Here is the function I want to slice. And the slicing cmd is llvm-slicer -sc '68#src_channels' -cutoff-diverging=false -entry convert linear.bc Screenshot from 2022-04-11 21-38-23

the slicing result by line number is

60
61
62
63
64
65
66
67
68
71
74

I am very confused about the result. It seems that line 60,64,65 has no dependece relationship with the src_channels in line 68. Also, as it is a variable in if condition, so line68-83 should also depend on it, which should be in its slicing. Can you give me some gudiance? is there something wrong with my cmd?

mchalupa commented 2 years ago

What does llvm-slicer say about the "matched slicing criteria"? Indeed, lines 60, 64, and 65 do not seem to be dependencies of 68, but that would not be a problem, just an imprecision.

Also, as it is a variable in if condition, so line68-83 should also depend on it, which should be in its slicing.

llvm-slicer does backward slice, so it searches for dependencies of 68, not for lines that depend on 68.

freexxxyyy commented 2 years ago

What does llvm-slicer say about the "matched slicing criteria"?

Here are the output of the cmd "llvm-slicer -sc '68#src_channels' -cutoff-diverging=false -entry convert linear.bc".

IntToPtr with constant:   <badref> = inttoptr i64 -6148914691236517206 to %struct.linear_priv*
IntToPtr with constant:   <badref> = inttoptr i64 -6148914691236517206 to i8*
WARNING: matched due to a lack of information:   %bf.load = load i8, i8* %enabled, align 8, !dbg !5527
SC: Matched '68#src_channels' to: 
    %6 = load %struct.snd_pcm_plugin_channel*, %struct.snd_pcm_plugin_channel** %src_channels.addr, align 8, !dbg !5524
    %bf.load = load i8, i8* %enabled, align 8, !dbg !5527
[llvm-slicer] CPU time of pointer analysis: 7.540000e-04 s
[llvm-slicer] CPU time of data dependence analysis: 2.970000e-04 s
[llvm-slicer] CPU time of control dependence analysis: 7.500000e-05 s
[llvm-slicer] Finding dependent nodes took 0 sec 0 ms
[llvm-slicer] Slicing dependence graph took 0 sec 0 ms
[llvm-slicer] Sliced away 197 from 266 nodes in DG
[llvm-slicer] saving sliced module to: linear.sliced

would not be a problem, just an imprecision. Is this imprecision common in dg?

llvm-slicer does backward slice, Oh. so dg does not do forward slicing? Do you know any other tools which also do forward slicing?

mchalupa commented 2 years ago

would not be a problem, just an imprecision. Is this imprecision common in dg?

Depends on the program and the features it uses. Imprecision is common in any static analysis tool.

llvm-slicer does backward slice, Oh. so dg does not do forward slicing? Do you know any other tools which also do forward slicing?

DG can do forward slicing if you use -forward switch. However, it combines it with backward slice in order to make the result an executable and well-formed bitcode. If you need only the list of statements from the slice, you need to hack DG (but it should be rather straightforward).

I do not know about any other tool that does forward slicing.

freexxxyyy commented 2 years ago

okay. Thanks. So "-forward" print both the forward slcing and backward slicing. And, here is the result of "llvm-slicer -forward -sc '68#src_channels' -cutoff-diverging=false -entry convert linear.bc". it contains all lines for that funtion. the results seem wired.

45
46
48
49
50
51
52
60
61
62
63
64
65
66
67
68
69
70
71
72
74
75
76
77
78
79
80
81
82
83
85
mchalupa commented 2 years ago

okay. Thanks. So "-forward" print both the forward slcing and backward slicing.

Not exactly. -forward does forward slice and then it takes the instructions from the forward slice and uses them as criteria for the backward slice to make the result well-defined bitcode.

And, here is the result of "llvm-slicer -forward -sc '68#src_channels' -cutoff-diverging=false -entry convert linear.bc". it contains all lines for that funtion. the results seem wired.

It is not that weird -- DG cannot know if src_channels and dst_channels alias with each other or with any other memory and thus the result is safely approximated to this. To get better precision, you must analyze convert with more context (change the entry point). Or you can wrap convert with a function that allocates the arguments and analyze it from it, i.e.:

void wrapper() {
  void *plugin = ...
  void *src_channels = malloc(...);
  void *dst_channels = malloc(...);
  frame_t frame = ...

  convert(plugin, src_channels, dst_channels, frame);
}
// call llvm-slicer with '-entry wrapper'

Unfortunately, at this moment, there is no automatic way of telling DG that arguments do not alias.