jstolarek / slicer

Companion code for paper "Imperative Functional Programs that Explain their Work", Wilmer Ricciotti, Jan Stolarek, Roly Perera and James Cheney, ICFP 2017, Oxford, UK
http://dl.acm.org/citation.cfm?id=3110258
GNU General Public License v3.0
7 stars 0 forks source link

Backward slicing has quadratic complexity #51

Closed jstolarek closed 7 years ago

jstolarek commented 7 years ago

Current implementation of backward slicing has quadratic complexity. This is caused by the fact that for every node we compute a set of store labels that it writes to:

bwdSliceM :: Outcome -> Trace -> SliceM (Env Value, Exp, Trace)
bwdSliceM outcome trace = do
  allStoreHoles <- allStoreHolesM (storeWrites trace)  <------- HERE
  case (outcome, trace) of

This is redundant - we could simply cache at each trace node store labels that the trace writes to. So the idea is to annotate trace with labels. This can be local to the Slice module - no need to expose that anywhere else.

Here's a plot showing how the runtime of sum-slice-size example increases as the input list becomes larger:

bench-sum-slice-size

jstolarek commented 7 years ago

I'm thinking how to add annotations to Trace. Three ideas that come to my mind:

  1. Create a data type that is identical to Trace but has annotations.

  2. Create a data type representing a tree of annotations. This tree would be processed in parallel with the trace tree.

  3. Use cofree Hutton comonad :-)

I'm leaning towards (2).

jamescheney commented 7 years ago

There's always

(4) write another paper about making it more convenient to program with datatypes with different kinds of annotations.

jstolarek commented 7 years ago

I implemented (2), with poor results. Firstly, complexity of the program still is quadratic. Here's a plot for 13ce100 (current references branch):

13ce100

And here's a plot for 8ccbcf5 (current js-T51 branch):

8ccbcf5

This by itself doesn't mean a thing: there are lots of other things that can have quadratic complexity (evaluation, tracing, or the program itself) and the new algorithm is certainly better than the old one. However, the new code is minimally slower. I suspect this might be caused by using sets to store labels. This seemed like a good idea, but now I think that this makes little sense given that we usually store less than 3 labels. So I think reverting StoreLabels to be internally represented as [Int] (and thus have multiset semantics) makes sense. I don't have time to do this and measure the results now - punting until after the deadline.

Attaching benchmarking results. 8ccbcf5.tar.gz 13ce100.tar.gz

jstolarek commented 7 years ago

I've been running more benchmarks using sum-slice-size.tml and modifying size of list l. I used d2adda4 (current references) as a baseline and tried two modifications:

  1. create a tree of annotation, where each node stores a set of labels modified by a corresponding fragment of the original syntax tree

  2. same as above, but represent store labels as lists rather than as sets.

(2) is definitely slower than the baseline. Comparison between (1) and baseline is inconclusive. Possible ways forward include:

a. Spend more time on debugging this to figure out what exactly is going on. This would quite likely cost 2-3 days. I'm not sure if this is worth the effort, especially that there are other places in the source code that can be performance bottlenecks.

b. Stick to existing implementation. Code is shorter and easier to read, and there is no clear gain in using my patch.

c. Use solution (1), ie. annotation tree using sets to represent store labels - it is a proper way of solving the problem.

I speculate that the reason this does not show in the benchmark is that checking of store holes does not happen very often and there is little opportunity to use pre-computed computations several times (ie. we compute store holes just once in both approaches). I would lean towards solution (C). Thoughts?

jstolarek commented 7 years ago

I went with (c).