Open tomjaguarpaw opened 3 years ago
This paper https://rubenpieters.github.io/assets/papers/JFP20-pipes.pdf presents similar results, it has some more benchmarks as well. @tomjaguarpaw its nice to see that you have presented a solution as well. Looks neat.
@tomjaguarpaw Here's another data point. While working on streamly, I experimented with changing streaming' s representation to be more like streamly's (and vector's) internal representation which can be found at streaming-fusion. Here is that library added to the benchmark:
Edit:
Edit 2: Also here's the modified benchmark https://github.com/pranaysashank/streaming-benchmark
Edit 3: Updated the image to reflect that conduit has no issue
@pranaysashank, I find the colors on that graph really hard to work out. Could you rearrange the key from top to bottom or something?
@treeowl I don't know enough about gnuplot to modify the generated image. So I just updated the comment with the keys top to bottom
[N.B. conduit doesn't have quadratic performance. I made a mistake when setting up the benchmarks: https://github.com/tomjaguarpaw/streaming-benchmark/commit/2a965a790c23c759c70ac580b25c59fd22ae922c]
While working on streamly, I experimented with changing streaming' s representation to be more like streamly's
Yup, makes sense that the performance is similar to streaming when I wrapped it with Codensity
. I find it interesting that streaming with explicit binds is faster still!
The Bind implementation is nice, I changed the MonadTrans instance of streaming-fusion
to be more like the streaming bind implementation and looks like it improves the performance quite a bit!!
Here's the relevant commit: https://github.com/pranaysashank/streaming-fusion/commit/9f56fe652ef2f79a9c7ee2c719406183bcdbd622
@pranaysashank Wow! That's impressive.
The problem
Consider walking over a tree using a streaming library to do something at every leaf:
On left-skewed trees, i.e. trees that will end up generating left-associated binds, for example those generated by
leftSkewed
the performance of your streaming library is quadratic. In fact performance is quadratic under
(Neither streamly nor conduit have quadratic performance)
The plot below demonstrates the claim. The code to generate the plot is available at https://github.com/tomjaguarpaw/streaming-benchmark. The compiler was GHC 8.10.7.
The solution
Firstly, before talking about a solution, do you actually consider this a bug? I do, and I think the risk of hitting unexpected quadratic performance makes this library unsuitable for production use as it is. On the other hand maybe you have a good reason that I shouldn't think that. If so I would be interested to hear it.
If you do think this is a bug then let's think about what can be done. I know of two straightforward options:
Codensity
(streamly essentially uses handwrittenCodensity
.)In my benchmarks explicit bind comes out slightly ahead.
Perhaps there is another option more suitable for your library.
I welcome your thoughts.