mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

How to slice the program accurately #441

Open hrshy0629 opened 2 years ago

hrshy0629 commented 2 years ago

I read and implemented DG, but I found that there were still several issues that failed to reach the target. 1) Slicing seems to be interprocedure. Can you set a parameter to control whether the output is the slice bitcode of the target function or the slice set bitcode of the interprocedure function? 2) Can you select more accurate variable location information? LLVM debug information contains row information and column information. Can you select variables by determining row and column information? Thank you for any suggestions.

mchalupa commented 2 years ago

Slicing seems to be interprocedure. Can you set a parameter to control whether the output is the slice bitcode of the target function or the slice set bitcode of the interprocedure function?

There is the option -entry to set the entry point of slicing (but it is still interprocedural). Most of the analyses should be able to work also intraproceduraly, but there is no switch to turn this on at this moment.

Can you select more accurate variable location information? LLVM debug information contains row information and column information. Can you select variables by determining row and column information?

You can select based on row, but not column at this moment. But it should be an easy hack at this place: https://github.com/mchalupa/dg/blob/master/tools/llvm-slicer-crit.cpp#L213 If you do not want to hack the code, you can modify the bitcode and insert a call of a dummy function at the instruction that you want to use as the slicing criterion.

hrshy0629 commented 2 years ago

@mchalupa Thanks for your comment. In response to my first question, I would like to modify the code to get specify the coded slice of a single function. Interprocedural information is useful for debugging, but I just want to cut out my own function information. Could you tell me the general file information in DG? For example, which files involve function slicing?

mchalupa commented 2 years ago

If you need to only see the result, you can try -annotate slice with llvm-slicer. It will produce a commented .ll file. Also, llvm-dg-dump has the switch -func that makes it to dump only a given function.

Slicing of LLVM DG is in this file: https://github.com/mchalupa/dg/blob/master/include/dg/llvm/LLVMSlicer.h but if you want to only dump the information, you probably want to just take the built dependence graph and dump some information about it. For that, you can find inspiration here: https://github.com/mchalupa/dg/blob/master/tools/llvm-dg-dump.cpp

In particular, you will want to get your function from this map: https://github.com/mchalupa/dg/blob/master/include/dg/llvm/LLVMDependenceGraph.h#L205 and then just iterate over the dependence graph and dump information about it.

hrshy0629 commented 2 years ago

@mchalupa I am not sure it is a bug. When I use llvm-slicer -sc LINE#err -entry FUNCNAME .bc, I find that it is good when the LINE is [132, 148], but it reports "Segmentation fault (core dumped)" when the LINE is [143, 145], which means that we do not choose the VAR that is in if/call_statement? Are there any other restrictions when using DG? `static int jread(struct buffer_head bhp, journal_t journal, unsigned int offset) { 132# int err; ... 143# err = jbd2_journal_bmap(journal, offset, &blocknr);

145# if (err) { printk(KERN_ERR "JBD2: bad block at offset %u\n", offset); 148# return err; }`

hrshy0629 commented 2 years ago

@mchalupa Hi, I find a bug. When the function instMatchesCrit(tools/llvm-slicer-crit.cpp) is matching instruction, it ignore the PHI instruction. The line in debug info of PHI is !6084 = !DILocation(line: 0, scope: !6073), you will get the line which is 0, which is error. The more reliable method is as follows:

`

        cleanup:
        %retval.0 = phi i32 [ -117, %if.then ], [ %call1, %if.then2 ], [ 0, %if.end22 ], [ -5, %if.then20 ], [ -12, %if.end4.cleanup_crit_edge ], !dbg !6084
        !6084 = !DILocation(**line: 0**, scope: !6073)
    %if.then:
        br label %cleanup, !dbg !6094
        !6094 = !DILocation(line: 140, column: 3, scope: !6093)
    %if.then2:
        br label %cleanup, !dbg !6102
        !6102 = !DILocation(line: 148, column: 3, scope: !6101)
    %if.end22
        br label %cleanup, !dbg !6144
        !6144 = !DILocation(line: 171, column: 2, scope: !6073)
     %if.then20
        br label %cleanup, !dbg !6142
        !6142 = !DILocation(line: 167, column: 3, scope: !6140)
    if.end4.cleanup_crit_edge
        br label %cleanup, !dbg !6109
            !6109 = !DILocation(line: 152, column: 6, scope: !6073)

`

And the true line is as follows: ` test

` I think we can fix this bug.

mchalupa commented 2 years ago

Thanks for looking into that. If you know how to fix this, can you prepare a pull request?

hrshy0629 commented 2 years ago

I need help! I debug the function instMatchesCrit. But I do not know why the function returns true when obj.empty() is true? If the obj is not NONE and we pass the line check, should we return true?

mchalupa commented 2 years ago

If obj.empty() is true, then obj is NONE. The rest of the code assumes, that there is some object we want to match and if there is no object that we want to specifically match, the instruction satisfies the criterion (it matches line and function). Or I do not understand the problem?

hrshy0629 commented 2 years ago

The overview of DG is: 1)match the input objs to Inst, but dg required the Insts of the objs can read or write the memory; 2) the class of LLVMSlicer initializes the different graphs based on the input bc file; 3) DG queries the Insts from the different graphs and output the results of slices. But the Insts which can not read or write the memory will be discarded, which will result in inaccurate results. Such as the example. The Inst br label %cleanup, !dbg !6102 has little to do with anything else, but it was part of our requirements.


%if.then2:
        br label %cleanup, !dbg !6102
        !6102 = !DILocation(line: 148, column: 3, scope: !6101)

Do I understand this correctly?

mchalupa commented 2 years ago

If an instruction is sliced away, it means none of slicing criterions depend on it, but it doesn't matter if it accesses memory or not. It is true, though, that matching slicing criteria is restricted to instructions that access memory or to function calls. But you can hack this part rather easily, I would say. Also, there is a switch 'slicing-criteria-are-next-inst' that makes the slicing criteria be the successors of the matched instructions. So what you can do is to insert calls of a dummy function before any instruction you want and use this switch to mark it as a slicing criterion.

hrshy0629 commented 2 years ago

Hi, I would like to ask whether the fault tolerance rate of DG is relatively small. I have tried many cases and encountered the problem of segment error.

` ../../llvm-slicer --sc mlxreg_fan_set_cur_state#379#err --cutoff-diverging=1 --cda ntscd --forward=1 --entry mlxreg_fan_set_cur_state 000cc5bc49aafc83a131bab2becf9922f5eec658_drivers@hwmon@mlxreg-fan.o.bc SC: Matched 'mlxreg_fan_set_cur_state#379#err' to: call void @asan_load8_noabort(i64 %34), !dbg !4009 %35 = load %struct.regmap*, %struct.regmap** %33, align 8, !dbg !4009 call void @asan_load4_noabort(i64 %37), !dbg !4010 %38 = load i32, i32 %36, align 4, !dbg !4010 %call95 = call i32 @regmap_write(%struct.regmap %35, i32 %38, i32 %div93.zext) #12, !dbg !4013 [llvm-slicer] cutoff 5 diverging blocks and 193 completely removed WARNING: matched due to a lack of information: %33 = load %struct.regmap*, %struct.regmap* %31, align 8, !dbg !4005 WARNING: matched due to a lack of information: %36 = load i32, i32 %34, align 4, !dbg !4006 WARNING: matched due to a lack of information: %call95 = call i32 @regmap_write(%struct.regmap %33, i32 %36, i32 %div93.zext) #13, !dbg !4009 SC: Matched 'mlxreg_fan_set_cur_state#379#err' to: %33 = load %struct.regmap, %struct.regmap* %31, align 8, !dbg !4005 %36 = load i32, i32 %34, align 4, !dbg !4006 %call95 = call i32 @regmap_write(%struct.regmap* %33, i32 %36, i32 %div93.zext) #13, !dbg !4009

0 0x00007fe36401285c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/sunhy/slice_llvm/dg-master/build/tools/libdgllvmslicer.so+0x34d85c)

1 0x00007fe364010604 llvm::sys::RunSignalHandlers() (/home/sunhy/slice_llvm/dg-master/build/tools/libdgllvmslicer.so+0x34b604)

2 0x00007fe364010773 SignalHandler(int) (/home/sunhy/slice_llvm/dg-master/build/tools/libdgllvmslicer.so+0x34b773)

3 0x00007fe3621c3980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)

4 0x00000000004c6ad5 std::_Rb_tree<dg::LLVMNode, dg::LLVMNode, std::_Identity<dg::LLVMNode>, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode> >::_M_get_insert_unique_pos(dg::LLVMNode const&) /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:0:0

5 0x00000000004c6ad5 std::pair<std::_Rb_tree_iterator<dg::LLVMNode>, bool> std::_Rb_tree<dg::LLVMNode, dg::LLVMNode, std::_Identity<dg::LLVMNode>, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode> >::_M_insert_unique<dg::LLVMNode const&>(dg::LLVMNode const&) /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:2091:4

6 0x00007fe363ab9f4a std::set<dg::LLVMNode, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode> >::insert(dg::LLVMNode const&) /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_set.h:0:9

7 0x00007fe363ab9f4a dg::DGContainer<dg::LLVMNode, 4u>::insert(dg::LLVMNode) /home/sunhy/slice_llvm/dg-master/include/dg/ADT/DGContainer.h:33:46

8 0x00007fe363ab9f4a dg::Node<dg::LLVMDependenceGraph, llvm::Value, dg::LLVMNode>::_addBidirectionalEdge(dg::LLVMNode, dg::LLVMNode*, dg::EdgesContainer<dg::LLVMNode, 4u>&, dg::EdgesContainer<dg::LLVMNode, 4u>&) /home/sunhy/slice_llvm/dg-master/include/dg/Node.h:356:24

9 0x00007fe363ab9f4a dg::Node<dg::LLVMDependenceGraph, llvm::Value, dg::LLVMNode>::addUseDependence(dg::LLVMNode) /home/sunhy/slice_llvm/dg-master/include/dg/Node.h:71:16

10 0x00007fe363ab9f4a dg::LLVMDependenceGraph::addDefUseEdges(bool) /home/sunhy/slice_llvm/dg-master/lib/llvm/LLVMDependenceGraph.cpp:1274:25

11 0x00000000004c3838 dg::llvmdg::LLVMDependenceGraphBuilder::_timerStart() /home/sunhy/slice_llvm/dg-master/include/dg/llvm/LLVMDependenceGraphBuilder.h:73:40

12 0x00000000004c3838 dg::llvmdg::LLVMDependenceGraphBuilder::_runControlDependenceAnalysis() /home/sunhy/slice_llvm/dg-master/include/dg/llvm/LLVMDependenceGraphBuilder.h:93:9

13 0x00000000004c3838 dg::llvmdg::LLVMDependenceGraphBuilder::computeDependencies(std::unique_ptr<dg::LLVMDependenceGraph, std::default_delete >&&) /home/sunhy/slice_llvm/dg-master/include/dg/llvm/LLVMDependenceGraphBuilder.h:235:9

14 0x00000000004b61f4 std::__uniq_ptr_impl<dg::LLVMDependenceGraph, std::default_delete >::_M_ptr() const /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/unique_ptr.h:147:42

15 0x00000000004b61f4 std::unique_ptr<dg::LLVMDependenceGraph, std::default_delete >::get() const /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/unique_ptr.h:332:21

16 0x00000000004b61f4 std::unique_ptr<dg::LLVMDependenceGraph, std::default_delete >::release() /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/unique_ptr.h:354:16

17 0x00000000004b61f4 std::unique_ptr<dg::LLVMDependenceGraph, std::default_delete >::operator=(std::unique_ptr<dg::LLVMDependenceGraph, std::default_delete >&&) /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/unique_ptr.h:278:12

18 0x00000000004b61f4 Slicer::computeDependencies() /home/sunhy/slice_llvm/dg-master/tools/include/dg/tools/llvm-slicer.h:105:13

19 0x00000000004b7821 std::_Rb_tree<dg::LLVMNode, dg::LLVMNode, std::_Identity<dg::LLVMNode>, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode> >::_Rb_tree_impl<std::less<dg::LLVMNode>, true>::_Rb_tree_impl() /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:688:28

20 0x00000000004b7821 std::_Rb_tree<dg::LLVMNode, dg::LLVMNode, std::_Identity<dg::LLVMNode>, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode*> >::_Rb_tree() /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tree.h:913:7

21 0x00000000004b7821 std::set<dg::LLVMNode, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode*> >::set() /usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_set.h:157:7

22 0x00000000004b7821 Slicer::mark(std::set<dg::LLVMNode, std::less<dg::LLVMNode>, std::allocator<dg::LLVMNode*> >&) /home/sunhy/slice_llvm/dg-master/tools/include/dg/tools/llvm-slicer.h:132:34

23 0x00000000004b5750 main /home/sunhy/slice_llvm/dg-master/tools/llvm-slicer.cpp:287:9

24 0x00007fe3614a2c87 __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:344:0

25 0x00000000004b4aba _start (/home/sunhy/slice_llvm/dg-master/build/tools/llvm-slicer+0x4b4aba)

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Segmentation fault (core dumped)

`

Is there any way to avoid or solve this problem?

mchalupa commented 2 years ago

Hi, I would like to ask whether the fault tolerance rate of DG is relatively small.

What do you mean by fault tolerance?

Is there any way to avoid or solve this problem?

What happens if you do not use cutoff-diverging? This option is meant for backward slicing (it is not explicitly written in the help message, but its nature kind of implies it) and I see you use forward=1.