Open twang15 opened 8 years ago
Marek's reply
You mean LLVMSlicer? (https://github.com/jirislaby/LLVMSlicer). The major difference is that dg has more precise and sound analyses (which can lead to smaller slices). The slicing algorithm itself slices more or less the same, since I haven't implemented the two-pass algorithm of Horwitz nor any other more precise version (e.g. specialization slicing) yet. Actually, dg was designed as a replacement for LLVMSlicer in the tool Symbiotic (https://github.com/staticafi/symbiotic). You can find more detailed information about dg here: http://is.muni.cz/th/396236/fi_m/thesis.pdf
Yes, it is LLVMSlicer. LLVMSlicer is based on Mark Weiser's classic algorithm. As noted by others(Interprocedural Slicing Using Dependence Graphs), this CFG-based algorithm has "calling context" issue, but it is fixable (K. B. Gallagher. Some notes on interprocedural program slicing. In Source Code Analysis and Manipulation, 2004.) and the result would be equivalent to Horwitz's PDG/SDG-based algorithm.
When you say dg has more precise and sound analyses, it is in what sense ? Is the pointer analysis of dg more precise ?
Yeah, well our PDG based algorithm has the same calling context issue at this moment. Yes, the pointer analysis is more evolved.
Dg has two pointer analyses: flow-insensitive and flow-sensitive. The flow-sensitive is more precise by design, but it is still not well debugged (there's at least one bug I know about). The flow-insensitive is basically the same as in LLVMSlicer, with these differences:
Thanks for sharing these details. I have the following further questions.
As noted here http://pages.cs.wisc.edu/~fischer/cs701.f08/lectures/Lecture26.4up.pdf,
"Both context-sensitive points-to analysis and flow-sensitive points-to analysis give little improvement over Andersen’s analysis."
So, which pointer analysis algorithm are we using in dg ? Was there a research paper to systematically show the pointer analysis algorithm in dg doing better ?
On Fri, Nov 11, 2016 at 8:29 AM, Marek Chalupa notifications@github.com wrote:
Yeah, well our PDG based algorithm has the same calling context issue at this moment. Yes, the pointer analysis is more evolved.
Dg has two pointer analyses: flow-insensitive and flow-sensitive. The flow-sensitive is more precise by design, but it is still not well debugged (there's at least one bug I know about). The flow-insensitive is basically the same as in LLVMSlicer, with these differences:
it uses UNKNOWN_OFFSET, which allows it to handle for example this issue in LLVMSlicer: jirislaby/LLVMSlicer#6 https://github.com/jirislaby/LLVMSlicer/issues/6
the UNKNOWN_OFFSET makes it also faster in some cases (e.g when analyzing arrays), since we can use UNKNOWN_OFFSET to stand for all offsets from some range, which is a sound approximation. So where LLVMSlicer would generate that p can point to x + 0, x + 4, x + 8, ..., x + 64 (if the offset is > 64, then the analysis goes unsound), we just say p can point to x + UNKNOWN_OFFSET, staying sound and fast (since our points-to set is of
size 1 instead of some big number)
it handles correctly greater portion of LLVM than LLVMSlicer's one, for example, we have better support for memset and memcpy calls (copying
whole blocks of memory with pointers), where the LLVMSlicer can be unsound
we run on LLVM 3.4 - 3.9
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mchalupa/dg/issues/107#issuecomment-259957391, or mute the thread https://github.com/notifications/unsubscribe-auth/APd91hP2zbgl5N4B16sPPbvsv3k9WvPlks5q9G3SgaJpZM4KvO1y .
On Fri, Nov 11, 2016 at 2:36 PM, Marek Chalupa notifications@github.com wrote:
-
Ad 1) In dg we use field-sensitive Andersen's analysis (the field sensitivity is configurable). There's no paper about measuring only the pointer analysis, but in the theses I linked above are measurements of differences between LLVMSlicer and dg (I suppose the numbers may have changed a bit, since some work was done on dg since the thesis).
Ad 2) Dg does not support slicing of concurrent programs. There may be some plans, but they are really no priority now as far as I can see (it may change in the future, though)
Ad 3) The calling context problem is because we slice with simple backward reachability. That is to say, we just compute the dependencies and than go backward over the edges as far as we can. Compared to the two-pass algorithm of Horwitz et. al., they compute additional summary edges and then perform two-passes over PDG to prevent the calling context problem. Our PDG is the same as the Horwitz's PDG (apart from that we do not compute the summary edges, since we would not use them atm.), but we just do only one pass over all dependecy edges in the graph, not two (over different subsets of edges). That lead to that our algorithm coincides with the CFG algorithm now. Reason why we use only simple backward reachability is simple: I just didn't get to implementing the two-pass algorithm yet (specialization slicing could be also a very good feature).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mchalupa/dg/issues/107#issuecomment-260037746, or mute the thread https://github.com/notifications/unsubscribe-auth/APd91uXeCv0HGa7Qqvs518TZBr2SFaODks5q9MPBgaJpZM4KvO1y .
Oh yeah, I meant there's no research paper comparing our pointer analyses.
Initial question:
I am using llvmslicer. Except llvmslicer is based on cfg and dg is based on pdg, what are the other major differences between dg and llvmslicer ? Does dg generate smaller slice ?