Closed SethTisue closed 9 years ago
some good discussion on this at starting at https://gitter.im/scala/scala?at=55d6a99d0d29ce8f44b73c49
Request for status update for Sept 10th SLIP meeting. @jsuereth, @SethTisue any more info to include over the next few days?
I'll give Josh's draft of views another runthrough.
@jsuereth - Okay, fixed my benchmarking. (Turned out it was too much GC for the robust statistical estimators to handle.)
So, there is some good news and bad news. The good news is that with the current framework, views generally improve a bit in performance, and the new framework is less subject to random breakages as it doesn't have a pile of stuff manually wired in in subclasses.
The bad news is that it doesn't have any subclasses, which means that performance optimizations made for those special cases (e.g. when you have indexing) are lost, resulting in performance that can be arbitrarily bad.
First, the setup for benchmarks, using the exploratory Viewducers library Josh wrote:
val th = new ichi.bench.Thyme
import com.jsuereth.collections.View._
val source = Array.range(0, 1024)
def v = source.view
def sv = source.stagedView
Here's some examples of head-to-head benchmarks of typical operations, showing that the new views are modestly better in most cases:
// sum (must traverse entire collection in both cases)
scala> th.pbenchOff(){ v.sum }{ sv.sum }
Benchmark comparison (in 1.349 s)
Significantly different (p ~= 7.836e-04)
Time ratio: 0.97425 95% CI 0.95996 - 0.98855 (n=30)
First 6.950 us 95% CI 6.879 us - 7.021 us
Second 6.771 us 95% CI 6.700 us - 6.842 us
// Standard map
scala> th.pbenchOff(){ v.map(_ * 3).sum }{ sv.map(_ * 3).sum }
Benchmark comparison (in 561.0 ms)
Significantly different (p ~= 0)
Time ratio: 0.84680 95% CI 0.83313 - 0.86047 (n=20)
First 11.86 us 95% CI 11.74 us - 11.99 us
Second 10.05 us 95% CI 9.922 us - 10.17 us
// Slicing a tiny fraction of the collection
scala> th.pbenchOff(){ v.drop(1).sum }{ sv.drop(1).sum }
Benchmark comparison (in 784.1 ms)
Not significantly different (p ~= 0.2723)
Time ratio: 1.00513 95% CI 0.99584 - 1.01442 (n=20)
First 7.855 us 95% CI 7.803 us - 7.906 us
Second 7.895 us 95% CI 7.844 us - 7.947 us
// Slicing a tiny fraction with a condition
scala> th.pbenchOff(){ v.dropWhile(_ < 1).sum }{ sv.dropWhile(_ < 1).sum }
Benchmark comparison (in 855.9 ms)
Significantly different (p ~= 4.877e-09)
Time ratio: 1.09285 95% CI 1.06685 - 1.11885 (n=20)
First 8.089 us 95% CI 7.947 us - 8.231 us
Second 8.841 us 95% CI 8.699 us - 8.983 us
// Map/filter combo
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).takeWhile(_ < 40).sum }{ sv.filter(_ % 3 != 0).takeWhile(_ < 40).sum }
Benchmark comparison (in 561.6 ms)
Significantly different (p ~= 0)
Time ratio: 0.80790 95% CI 0.80002 - 0.81579 (n=20)
First 12.96 us 95% CI 12.88 us - 13.04 us
Second 10.47 us 95% CI 10.39 us - 10.55 us
// Complex map/filter--old version wins, but not by much
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).map(_ + 2).filter(_ % 5 != 0).sum }{ sv.filter(_ % 3 != 0).map(_ + 2).filter(_ % 5 != 0).sum }
Benchmark comparison (in 646.1 ms)
Significantly different (p ~= 0)
Time ratio: 1.26793 95% CI 1.24608 - 1.28979 (n=20)
First 16.08 us 95% CI 15.93 us - 16.23 us
Second 20.39 us 95% CI 20.09 us - 20.69 us
However, we lose performance to an almost arbitrary degree (O(1) vs O(n), often) when slicing or otherwise jumping through or merely computing the size of collections:
// Bare size--fast in both cases, but old views have lower overhead
scala> th.pbenchOff(){ v.size }{ sv.size }
Benchmark comparison (in 254.9 ms)
Significantly different (p ~= 0)
Time ratio: 2.07589 95% CI 1.93849 - 2.21329 (n=150)
First 6.054 ns 95% CI 5.693 ns - 6.415 ns
Second 12.57 ns 95% CI 12.21 ns - 12.93 ns
// Taking a small fraction of the collection--maybe fixable with improved early exit?
// Note change of units, old version is 34 times faster!
scala> th.pbenchOff(){ v.takeWhile(_ < 10).sum }{ sv.takeWhile(_ < 10).sum }
Benchmark comparison (in 382.6 ms)
Significantly different (p ~= 0)
Time ratio: 34.08428 95% CI 33.35303 - 34.81552 (n=20)
First 156.9 ns 95% CI 153.8 ns - 159.9 ns
Second 5.347 us 95% CI 5.299 us - 5.396 us
// Size on a trivially collection modified in known ways
// Note change of units, over 300x slower!!
scala> th.pbenchOff(){ v.dropWhile(_ < 1).size }{ sv.dropWhile(_ < 1).size }
Benchmark comparison (in 391.9 ms)
Significantly different (p ~= 0)
Time ratio: 301.38928 95% CI 271.63507 - 331.14349 (n=20)
First 24.99 ns 95% CI 22.52 ns - 27.45 ns
Second 7.531 us 95% CI 7.492 us - 7.571 us
// Even more trivial change!
// Note change of units--over nine hundred times slower!!!
scala> th.pbenchOff(){ v.drop(1).size }{ sv.drop(1).size }
Benchmark comparison (in 441.9 ms)
Significantly different (p ~= 0)
Time ratio: 909.92073 95% CI 453.72843 - 1366.11302 (n=20)
First 8.617 ns 95% CI 4.297 ns - 12.94 ns
Second 7.841 us 95% CI 7.771 us - 7.910 us
// Sum of small slice--cannot be accelerated without late entry as well as early exit
// Note change of units--old version is 33x faster!
scala> th.pbenchOff(){ v.slice(500,510).sum }{ sv.slice(500, 510).sum }
Benchmark comparison (in 370.2 ms)
Significantly different (p ~= 0)
Time ratio: 32.92864 95% CI 31.93851 - 33.91878 (n=20)
First 89.19 ns 95% CI 86.59 ns - 91.80 ns
Second 2.937 us 95% CI 2.916 us - 2.958 us
The new views have really fast filtering and collection creation, both in simple and compound cases:
// Bare filter
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).sum }{ sv.filter(_ % 3 != 0).sum }
Benchmark comparison (in 633.1 ms)
Significantly different (p ~= 0)
Time ratio: 0.60164 95% CI 0.58747 - 0.61581 (n=20)
First 18.25 us 95% CI 17.92 us - 18.58 us
Second 10.98 us 95% CI 10.82 us - 11.15 us
// Compound filter
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).filter(_ % 5 != 0).filter(_ % 2 != 0).sum }{ sv.filter(_ % 3 != 0).filter(_ % 5 != 0).filter(_ % 2 != 0).sum }
Benchmark comparison (in 496.4 ms)
Significantly different (p ~= 0)
Time ratio: 0.51228 95% CI 0.48514 - 0.53942 (n=20)
First 39.86 us 95% CI 38.34 us - 41.37 us
Second 20.42 us 95% CI 19.66 us - 21.17 us
// List creation
scala> th.pbenchOff(){ v.to[List] }{ sv.to[List] }
Benchmark comparison (in 665.2 ms)
Significantly different (p ~= 0)
Time ratio: 0.40543 95% CI 0.38705 - 0.42380 (n=20)
First 21.06 us 95% CI 20.25 us - 21.88 us
Second 8.540 us 95% CI 8.337 us - 8.743 us
So overall, it's definitely mixed. It is not for naught that old views were subclassed; it does make a massive difference in some cases. (Note that there is no particular reason you can't create a staged view of a slice, so only cases where you don't know in advance how much to slice will be most seriously affected.)
If we're willing to give up the shortcuts and speedups we get with direct indexing, the new version has fewer bugs and in some cases better performance (especially with filter). But the old version as a TraversableView only would have fewer bugs also. It was always the subclassing that introduced the most bugs!
Overall, I can't really see a highly compelling raw performance benefit. It's probably better for GC (I haven't measured carefully). If these views are to replace the old ones, I think the justification must mostly come from aspects other than performance. (Of course, if you can take advantage of the wins and aren't hurt by the drawbacks, it will seem like quite a nice upgrade.)
For everyone following along, here is the link to the "new views": https://github.com/jsuereth/viewducers
@Ichoran Do you have the benchmarks you were running online? I was interested in comparing viewducers to relatively naive uses of .iterator
. It might be nice to add them to the project.
@non - The above is just a transcript of my REPL session! You just need Thyme and Viewducers. (Just drop the Thyme jar in lib, I guess, and use sbt console
.) All you'd need to do differently is to add def it = source.iterator
, I guess. Oh, and I had -Xmx4G
or something; there was too much GC churn at the default to get stable results.
@jsuereth has now opened a PR at https://github.com/scala/slip/pull/17, so, closing this. (of course, that doesn't mean other proposals aren't welcome also)
@Ichoran Yeah, I understand about the random improvements in subclasses. I don't think we have to drop all that perf, but I would like to FIRST drop all the bugs, then see if we can gain back the performance.
SO far, I haven't been able to make subclass-specific optimisations work in viewducers prototype without sacrificing the performance gains in the detection. Ideals welcome there.
Also, @Ichoran You should try operatoins in slice
examples that DO NOT preserve the shape of the collection (e.g. flatMap, dropWhile, etc.).
@jsuereth - flatMap
sometimes shows an improvement, sometimes not, depending on what I do afterwards. collect
is often a little better. dropWhile
does preserve the shape and is dramatically faster the old way if you're not dropping much. E.g.
scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).size }{ sv.flatMap(i => Seq(i, -i)).size }
Benchmark comparison (in 678.1 ms)
Significantly different (p ~= 1.007e-10)
Time ratio: 1.03904 95% CI 1.02996 - 1.04812 (n=20)
First 108.2 us 95% CI 107.6 us - 108.9 us
Second 112.5 us 95% CI 111.8 us - 113.2 us
scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).sum }{ sv.flatMap(i => Seq(i, -i)).sum }
Benchmark comparison (in 709.6 ms)
Significantly different (p ~= 1.081e-04)
Time ratio: 0.97033 95% CI 0.95673 - 0.98394 (n=20)
First 121.2 us 95% CI 120.0 us - 122.3 us
Second 117.6 us 95% CI 116.4 us - 118.7 us
scala> th.pbenchOff(){ v.collect{ case x if x % 2 == 0 => x >> 1}.sum }{ sv.collect{ case x if x % 2 == 0 => x >> 1 }.sum }
Benchmark comparison (in 658.6 ms)
Significantly different (p ~= 4.857e-11)
Time ratio: 0.95801 95% CI 0.94889 - 0.96713 (n=20)
First 13.47 us 95% CI 13.38 us - 13.56 us
Second 12.90 us 95% CI 12.82 us - 12.99 us
scala> th.pbenchOff(){ v.dropWhile(_ < 20).size }{ sv.dropWhile(_ < 20).size }
Benchmark comparison (in 747.2 ms)
Significantly different (p ~= 0)
Time ratio: 73.39225 95% CI 70.64271 - 76.14179 (n=30)
First 101.0 ns 95% CI 97.26 ns - 104.6 ns
Second 7.409 us 95% CI 7.350 us - 7.468 us
I finally found where the really big wins come from: using non-indexable collections like List:
scala> th.pbenchOff(){ v.dropWhile(_ < 20).sum }{ sv.dropWhile(_ < 20).sum }
Benchmark comparison (in 647.5 ms)
Significantly different (p ~= 0)
Time ratio: 0.49890 95% CI 0.49411 - 0.50368 (n=20)
First 13.77 us 95% CI 13.68 us - 13.86 us
Second 6.869 us 95% CI 6.823 us - 6.916 us
scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).size }{ sv.flatMap(i => Seq(i, -i)).size }
Benchmark comparison (in 634.7 ms)
Significantly different (p ~= 0)
Time ratio: 0.13020 95% CI 0.12904 - 0.13137 (n=20)
First 868.4 us 95% CI 862.8 us - 874.0 us
Second 113.1 us 95% CI 112.4 us - 113.8 us
In light of this, I think the performance tale is a little more compelling than I was thinking before.
Shall I just leave comments here? They're not actually on the SIP page.
there is no concrete SLIP yet, but there has been a lot of discussion lately (involving @Ichoran, @jsuereth, @odersky, and others) about doing something about views. possibilities include:
1) fix we have 2) deprecate what we have 3) replace what we have with something new (& deprecate the old)