mchalupa / dg

[LLVM Static Slicer] Various program analyses, construction of dependence graphs and program slicing of LLVM bitcode.
MIT License
474 stars 131 forks source link

Question about generating multiple slices of single program #223

Open naegling opened 5 years ago

naegling commented 5 years ago

I'm currently working on a project that requires a large number, N, of slices of a single program. Each slice is from a single, distinct criteria. I'm currently just inserting a criteria callsite into N copies of the bc module and invoking llvm-slicer N times. This works, but is a bit slow since it has to repeat the (almost) identical pointer and reaching analysis N times.

Any thoughts on how to re-use a prior analysis?

mchalupa commented 5 years ago

Hi,

actually, this is one of the extensions that I would like to do in the future, I just do not know when I'll get to it. You need to clone the module inside dg and remap the nodes to new instructions. It is not hard in theory, just takes a bit of work to do that right.

mchalupa commented 5 years ago

Of course, if you would be interested in providing patches and just need a starting point, I can help you with that.

naegling commented 5 years ago

If my trial experiments look promising, I will have to implement a multi-slice. I would certainly share my modifications. :)

Which do you think would be easier: 1) analyze, clone module, remap analysis nodes, slice clone 2) analyze, clone module, use original analysis to mark clone through clone value map, slice clone 3) and now for something completely different

My principle concern with 1) is that the analysis seems tightly coupled to the module. Seems error-prone to locate every module dependence. Also seems fragile for future maintenance.

Is there a particular commit/release you would suggest I branch? Seems like some work is currently ongoing. Also, the pta is faulting on one of my trial benchmarks (an early version of grep)

CLI: my thought is to just extend your current slice criteria list. comma would continue to delimit criteria within a slice. colans would delimit slices.

Thoughts?

mchalupa commented 5 years ago

Although 1. would be more generic (you could continue working with the cloned module as you would have all the results mapped to this module), it is more complicated and you are right in your concerns. So, unless there appears a need to do 1., I think doing 2. completely is fine.

Is there a particular commit/release you would suggest I branch? Seems like some work is currently ongoing

I plan to change the structure of the project quite a lot (I hope people won't hate me for that :), so I would wait a few more days before your changes. Then you can branch from master.

Also, the pta is faulting on one of my trial benchmarks (an early version of grep)

File an issue, please. I can take a look at it.

CLI: my thought is to just extend your current slice criteria list. comma would continue to delimit criteria within a slice. colans would delimit slices.

Seems fine. Just need to solve cases like e.g. that you want to slice w.r.t all calls of foo function, but for each call, you want to have a separate slice. At this moment, we have two kinds of slicing criteria:

1) all call-sites of a given function 2) line:variable (:global-variable) for matching instructions that use the given variable on a given line.

We can probably generalize this to line:(variable/call-site) to specify the precise call site when needed. Then we can generate a separate slice for every call-site. However, listing manually all lines where the slicing criterion is called is usually not what you want to do in the above-mentioned case -- you want to say "generate a separate slice for each call of xxx". So maybe we can add some keywords instead of line (so if the "line" is number, it is interpreted as the precise line, if it is empty, it means all lines and if it is xxx, it means all lines but separate slices). Or we may do the colon thing and then add a switch that tells the slicer that it should separate slices. So the user can either list the criteria (-c c1;c2;...) or slice w.r.t all call-sites of a function, but separate the slices (-c foo -separate-slices).

What do you think? Which suits better your use-case?

naegling commented 5 years ago

CLI: I must have been looking at an old commit, 'cause I missed the whole line:variable specification. I only saw the callsites. As in your example, we could use semi-colons to separate slices, comma to separate criteria within a slice, and colons to separate (line, variable) pairs. If something like this is to be the CLI, then I would also need to be able to read the criteria in from a file. Several of my benchmarks have a few thousand discrete slicing criteria.

However, my use case may be a-typical and not a good model for a general purpose CLI. I have an llvm bc module with a set of N suspicious load/store/gep instructions. I want N slices of the module, each with respect to a single instruction and its associated memory pointer (target of load/store and base pointer of gep).