haskell-streaming / streaming

An optimized general monad transformer for streaming applications, with a simple prelude of functions
BSD 3-Clause "New" or "Revised" License
156 stars 30 forks source link

Quadratic performance on left-associated binds #109

Open tomjaguarpaw opened 3 years ago

tomjaguarpaw commented 3 years ago

The problem

Consider walking over a tree using a streaming library to do something at every leaf:

data Tree = Branch Tree Tree | Leaf Int

walkTreeTransformer :: (Monad (m IO), MonadTrans m)
                    => (Int -> IO ()) -> Tree -> m IO ()
walkTreeTransformer doSomething = loop
  where loop = \case
          Leaf i -> lift (doSomething i)
          Branch t1 t2 -> do
            loop t1
            loop t2

On left-skewed trees, i.e. trees that will end up generating left-associated binds, for example those generated by leftSkewed

-- Fully evaluated left-skewed tree
leftSkewed :: Int -> Tree
leftSkewed 0 = Leaf 0
leftSkewed n = (Branch $! leftSkewed (n - 1)) (Leaf n)

the performance of your streaming library is quadratic. In fact performance is quadratic under

(Neither streamly nor conduit have quadratic performance)

The plot below demonstrates the claim. The code to generate the plot is available at https://github.com/tomjaguarpaw/streaming-benchmark. The compiler was GHC 8.10.7.

image

The solution

Firstly, before talking about a solution, do you actually consider this a bug? I do, and I think the risk of hitting unexpected quadratic performance makes this library unsuitable for production use as it is. On the other hand maybe you have a good reason that I shouldn't think that. If so I would be interested to hear it.

If you do think this is a bug then let's think about what can be done. I know of two straightforward options:

  1. Add an explicit constructor for binds and then reassociate binds during elimination
  2. Use Codensity (streamly essentially uses handwritten Codensity.)

In my benchmarks explicit bind comes out slightly ahead.

Perhaps there is another option more suitable for your library.

I welcome your thoughts.

harendra-kumar commented 3 years ago

This paper https://rubenpieters.github.io/assets/papers/JFP20-pipes.pdf presents similar results, it has some more benchmarks as well. @tomjaguarpaw its nice to see that you have presented a solution as well. Looks neat.

pranaysashank commented 3 years ago

@tomjaguarpaw Here's another data point. While working on streamly, I experimented with changing streaming' s representation to be more like streamly's (and vector's) internal representation which can be found at streaming-fusion. Here is that library added to the benchmark:

image

Edit:

Legend (from top-to-bottom):

Edit 2: Also here's the modified benchmark https://github.com/pranaysashank/streaming-benchmark

Edit 3: Updated the image to reflect that conduit has no issue

treeowl commented 3 years ago

@pranaysashank, I find the colors on that graph really hard to work out. Could you rearrange the key from top to bottom or something?

pranaysashank commented 3 years ago

@treeowl I don't know enough about gnuplot to modify the generated image. So I just updated the comment with the keys top to bottom

tomjaguarpaw commented 3 years ago

[N.B. conduit doesn't have quadratic performance. I made a mistake when setting up the benchmarks: https://github.com/tomjaguarpaw/streaming-benchmark/commit/2a965a790c23c759c70ac580b25c59fd22ae922c]

tomjaguarpaw commented 3 years ago

While working on streamly, I experimented with changing streaming' s representation to be more like streamly's

Yup, makes sense that the performance is similar to streaming when I wrapped it with Codensity. I find it interesting that streaming with explicit binds is faster still!

pranaysashank commented 3 years ago

The Bind implementation is nice, I changed the MonadTrans instance of streaming-fusion to be more like the streaming bind implementation and looks like it improves the performance quite a bit!!

image

Here's the relevant commit: https://github.com/pranaysashank/streaming-fusion/commit/9f56fe652ef2f79a9c7ee2c719406183bcdbd622

tomjaguarpaw commented 3 years ago

@pranaysashank Wow! That's impressive.