Open harendra-kumar opened 6 years ago
I think there is exactly the same problem with mapM
as well and that also explains why the mapM
benchmark result for conduit is significantly worse than other libraries.
Thanks for the report. I'm mostly AFK this week, so haven't had a chance to look at this. These kinds of optimizations are highly sensitive to GHC inlining rules. Can you clarify exactly which stack.yaml
file in your repo, and which commit, you're seeing these figures on?
The latest commit and stack.yaml should reproduce this.
./run.sh -- -m pattern conduit
should run all the conduit benchmarks. You can change this file, e.g. replace sinkNull
with mapM_ and run the benchmarks again, this time with the --append
option ./run.sh --append -- -m pattern conduit
, this will generate a comparison graph between the baseline and your new changes in the charts
directory.
Alright, I've looked into this, and have a few comments:
transformation/map/conduit
test.In sum: it's nice to get conduit faster via rewrite rules, but it's not the intended use case of the library, and can't be relied upon. We know how to get super fast, reliable streaming code (direct stream fusion, as I laid out in my blog post), but the convenience of higher-level libraries wins out in almost all cases, which is why there doesn't seem to be pressure to make that happen.
diff --git a/Benchmarks/Conduit.hs b/Benchmarks/Conduit.hs
index 5baf369..cda392f 100644
--- a/Benchmarks/Conduit.hs
+++ b/Benchmarks/Conduit.hs
@@ -86,11 +86,13 @@ last = eliminate $ S.last
{-# INLINE transform #-}
transform :: Monad m => Pipe m Int Int -> Source m () Int -> m ()
-- mapM_ is much more costly compared to sinkNull
+transform t src = S.runConduit (src S..| t S..| S.mapM_ (\_ -> return ()))
--transform t = runStream (t S..| S.mapM_ (\_ -> return ()))
-transform t = runStream (t S..| S.sinkNull)
+--transform t = runStream (t S..| S.sinkNull)
scan = transform $ S.scanl (+) 0
-map = transform $ S.map (+1)
+--map = transform $ S.map (+1)
+map = \src -> S.runConduit $ src S..| S.map (+1) S..| S.mapM_ (\_ -> return ())
mapM = transform $ S.mapM return
filterEven = transform $ S.filter even
filterAllOut = transform $ S.filter (> maxValue)
diff --git a/stack.yaml b/stack.yaml
index 05d93fb..d0b2af2 100644
--- a/stack.yaml
+++ b/stack.yaml
@@ -18,4 +18,10 @@ extra-deps:
- lens-4.15.4
- free-4.12.4
- drinkery-0.3
+
+ - git: https://github.com/snoyberg/conduit
+ commit: b134241f3ab545949d2cf5748de3ef012a0266b9
+ subdirs:
+ - conduit
+
rebuild-ghc-options: true
I agree with your observations and that's why I do not have any recommendations of streaming libraries based on these performance numbers. They are to be used with caution. In fact I have this caveat in the README to warn against the tendency to misuse micro-benchmarks such as this:
When choosing a streaming library to use we should not be over obsessed about the performance numbers as long as the performance is within reasonable bounds. Whether the absolute performance or the differential among various libraries matters or not may depend on your workload. If the cost of processing the data is significantly higher compared to the streaming overhead then the difference may not matter at all; unless you are performing huge number of tiny operations, performance difference may not be significant.
The goal of these benchmarks was to explicitly measure just the cost of operations discounting fusion to not complicate the comparison/explanation of results. In fact I designed them such that fusion is avoided. So, good to know that its not working. I was also considering to design separate benchmarks to show the fusion capabilities of libraries, but that's not done yet.
My main point here was that since sinkNull
and mapM_
can be used for the same job interchangeably, it may result in significant difference in perf that can be potentially introduced inadvertently, the perf diff depends on the use case of course. If it is possible to fix it, it is nice to have that fix. Exactly the same problem is seen with pipes as well, there it isdiscard
vs mapM_
.
I encountered a situation in practice where
someConduit .| awaitForever (\_ -> return ())
performs substantially worse than
someConduit .| sinkNull
I guess this is related to what you discussed above, as sinkNull
is implemented under the hood exactly as awaitForever (\_ -> return ())
apart from inlining differences.
Have there been any further developments on this matter in the past year?
I have been struggling with why conduit performance results are out of the whack in these benchmarks https://github.com/composewell/streaming-benchmarks. I have root caused the issue to
mapM_
to discard the stream. I think this usage was picked up from the benchmarks in themachines
package and I retained this.mapM_
is also used in other packages likepipes
andstreaming
for the same purpose so I did not find anything wrong with using that, and it made it all the more harder to root cause this issue.However, there is a drastic improvement in performance when
sinkNull
is used instead ofmapM_
.mapM_
is defined like this:and used like
mapM_ (\_ -> return ())
to discard the stream.sinkNull
is:There is not much difference except inline definition of the discarding function. Perhaps the polymorphism in
mapM_
is problematic or thelift
or is it something else?