mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

Slice process is blocked #446

Open Clingto opened 1 year ago

Clingto commented 1 year ago

The command:

llvm-slicer -c malloc mjs_bin_slice.bc

And the running:

start ...
SC: Matched 'malloc()' to: 
    %call7 = call noalias i8* @malloc(i64 %add) #5, !dbg !927
    %call7 = call noalias i8* @malloc(i64 %14) #5, !dbg !950
    %call20 = call noalias i8* @malloc(i64 %conv19) #5, !dbg !986
    %call335 = call noalias i8* @malloc(i64 %conv334) #5, !dbg !1251
    %call354 = call noalias i8* @malloc(i64 %conv353) #5, !dbg !1275
    %call10 = call noalias i8* @malloc(i64 %add) #5, !dbg !950
    %call1 = call noalias i8* @malloc(i64 %add) #5, !dbg !923
    %call = call noalias i8* @malloc(i64 %conv) #5, !dbg !929
[llvm-slicer] cutoff 56 diverging blocks and 5 completely removed

get blocked(20+minutes...)

Could you please give me some advice? Thanks.

mchalupa commented 1 year ago

What is the bitcode? How big is the program?

Clingto commented 1 year ago

What is the bitcode? How big is the program?

Hi, The bitcode in my "$link/SRC_bin/mjs_bin_slice.bc". And the size is around 800kb.

I put my data of proof-of-concept to reproduce this bug in the public git repo.

And the link is: https://github.com/Clingto/POC/tree/master/POCs-DG/mjs

And it contains: 1、The source code of program is in the file directory "code"; 2、I built the ngiflib program with bash script ("build_mjs_bin.sh"); 3、I sliced it with dg/llvm-slicer tool with bash script ("slice_mjs_bin.sh") 4、My original ".bc" file is in the file directory "SRC_bin/mjs_bin_slice.bc"; 5、Environment: clang version 6.0.0 mchalupa/dg version is git ce1b0724f3

mchalupa commented 1 year ago

Seems that the bitcode is just too complicated for the analyses (points-to analysis), so it just computes very long. The points-to analysis in DG is very inefficient...

Clingto commented 1 year ago

Seems that the bitcode is just too complicated for the analyses (points-to analysis), so it just computes very long. The points-to analysis in DG is very inefficient...

Hi, I am doing a research work, and I don't know how to tell if a program is too complicated for DG.

As my goal doesn't need precise pointer analysis but only needs coarse-grained analysis,

I wonder whether DG can slice the instructions, which shows how the input can affect the "size" parameter of "call malloc(size)" instruction and affect the "pointer" parameter of "call free(pointer)" instruction.

Or slice the instructions, which show how to reach the "call malloc(size)" instruction no matter whether it is intra-function or inter-function?

Can I use DG to achieve it directly? or may I do some modification of DG to accomplish this and how?

Thank you very much.

mchalupa commented 1 year ago

It depends on what do you mean by "show how the input can affect XXX". DG can compute an interprocedural dependence graph and from that you can get the information about which instructions may be influenced by what instructions.

Clingto commented 1 year ago

It depends on what do you mean by "show how the input can affect XXX". DG can compute an interprocedural dependence graph and from that you can get the information about which instructions may be influenced by what instructions.

My "show how the input affects XXX" refers to the path from when the input file is loaded until execution to the affected XXX instruction. Now, I know my concerned destination instructions are "call malloc(size)" location sites, but how can I define my source instructions (i.e., input file loaded instructions) in DG so that I can slice the instructions between them?

mchalupa commented 1 year ago

By path you mean a path in the CFG of the program?

From the dependence graph, all the statements that are backward reachable from the ,,destination'' instructions may affect the destination instruction. That is how you get the slice.

Clingto commented 1 year ago

By path you mean a path in the CFG of the program?

From the dependence graph, all the statements that are backward reachable from the ,,destination'' instructions may affect the destination instruction. That is how you get the slice.

Yes. So the command "llvm-slicer -cutoff-diverging=0 -sc malloc test_slice.bc" is it?

mchalupa commented 1 year ago

The command llvm-slicer -sc malloc test_slice.bc (irrespective of -cutoff-diverging) creates an executable backward slice w.r.t every call of malloc or (possible) use of variable with name malloc -- so yes, that should probably be what you want. Maybe you just want to specify the slicing criterion better, e.g. -sc 123#malloc() (call of malloc on line 123).