scala / slip

obsolete — archival use only
67 stars 15 forks source link

revise or replace "views" in collection library #9

Closed SethTisue closed 9 years ago

SethTisue commented 9 years ago

there is no concrete SLIP yet, but there has been a lot of discussion lately (involving @Ichoran, @jsuereth, @odersky, and others) about doing something about views. possibilities include:

1) fix we have 2) deprecate what we have 3) replace what we have with something new (& deprecate the old)

SethTisue commented 9 years ago

some good discussion on this at starting at https://gitter.im/scala/scala?at=55d6a99d0d29ce8f44b73c49

dickwall commented 9 years ago

Request for status update for Sept 10th SLIP meeting. @jsuereth, @SethTisue any more info to include over the next few days?

Ichoran commented 9 years ago

I'll give Josh's draft of views another runthrough.

Ichoran commented 9 years ago

@jsuereth - Okay, fixed my benchmarking. (Turned out it was too much GC for the robust statistical estimators to handle.)

So, there is some good news and bad news. The good news is that with the current framework, views generally improve a bit in performance, and the new framework is less subject to random breakages as it doesn't have a pile of stuff manually wired in in subclasses.

The bad news is that it doesn't have any subclasses, which means that performance optimizations made for those special cases (e.g. when you have indexing) are lost, resulting in performance that can be arbitrarily bad.

First, the setup for benchmarks, using the exploratory Viewducers library Josh wrote:

val th = new ichi.bench.Thyme
import com.jsuereth.collections.View._
val source = Array.range(0, 1024)
def v = source.view
def sv = source.stagedView

Here's some examples of head-to-head benchmarks of typical operations, showing that the new views are modestly better in most cases:

// sum (must traverse entire collection in both cases)
scala> th.pbenchOff(){ v.sum }{ sv.sum }
Benchmark comparison (in 1.349 s)
Significantly different (p ~= 7.836e-04)
  Time ratio:    0.97425   95% CI 0.95996 - 0.98855   (n=30)
    First     6.950 us   95% CI 6.879 us - 7.021 us
    Second    6.771 us   95% CI 6.700 us - 6.842 us

// Standard map
scala> th.pbenchOff(){ v.map(_ * 3).sum }{ sv.map(_ * 3).sum }
Benchmark comparison (in 561.0 ms)
Significantly different (p ~= 0)
  Time ratio:    0.84680   95% CI 0.83313 - 0.86047   (n=20)
    First     11.86 us   95% CI 11.74 us - 11.99 us
    Second    10.05 us   95% CI 9.922 us - 10.17 us

// Slicing a tiny fraction of the collection
scala> th.pbenchOff(){ v.drop(1).sum }{ sv.drop(1).sum }
Benchmark comparison (in 784.1 ms)
Not significantly different (p ~= 0.2723)
  Time ratio:    1.00513   95% CI 0.99584 - 1.01442   (n=20)
    First     7.855 us   95% CI 7.803 us - 7.906 us
    Second    7.895 us   95% CI 7.844 us - 7.947 us

// Slicing a tiny fraction with a condition
scala> th.pbenchOff(){ v.dropWhile(_ < 1).sum }{ sv.dropWhile(_ < 1).sum }
Benchmark comparison (in 855.9 ms)
Significantly different (p ~= 4.877e-09)
  Time ratio:    1.09285   95% CI 1.06685 - 1.11885   (n=20)
    First     8.089 us   95% CI 7.947 us - 8.231 us
    Second    8.841 us   95% CI 8.699 us - 8.983 us

// Map/filter combo
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).takeWhile(_ < 40).sum }{ sv.filter(_ % 3 != 0).takeWhile(_ < 40).sum }
Benchmark comparison (in 561.6 ms)
Significantly different (p ~= 0)
  Time ratio:    0.80790   95% CI 0.80002 - 0.81579   (n=20)
    First     12.96 us   95% CI 12.88 us - 13.04 us
    Second    10.47 us   95% CI 10.39 us - 10.55 us

// Complex map/filter--old version wins, but not by much
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).map(_ + 2).filter(_ % 5 != 0).sum }{ sv.filter(_ % 3 != 0).map(_ + 2).filter(_ % 5 != 0).sum }
Benchmark comparison (in 646.1 ms)
Significantly different (p ~= 0)
  Time ratio:    1.26793   95% CI 1.24608 - 1.28979   (n=20)
    First     16.08 us   95% CI 15.93 us - 16.23 us
    Second    20.39 us   95% CI 20.09 us - 20.69 us

However, we lose performance to an almost arbitrary degree (O(1) vs O(n), often) when slicing or otherwise jumping through or merely computing the size of collections:

// Bare size--fast in both cases, but old views have lower overhead
scala> th.pbenchOff(){ v.size }{ sv.size }
Benchmark comparison (in 254.9 ms)
Significantly different (p ~= 0)
  Time ratio:    2.07589   95% CI 1.93849 - 2.21329   (n=150)
    First     6.054 ns   95% CI 5.693 ns - 6.415 ns
    Second    12.57 ns   95% CI 12.21 ns - 12.93 ns

// Taking a small fraction of the collection--maybe fixable with improved early exit?
// Note change of units, old version is 34 times faster!
scala> th.pbenchOff(){ v.takeWhile(_ < 10).sum }{ sv.takeWhile(_ < 10).sum }
Benchmark comparison (in 382.6 ms)
Significantly different (p ~= 0)
  Time ratio:    34.08428   95% CI 33.35303 - 34.81552   (n=20)
    First     156.9 ns   95% CI 153.8 ns - 159.9 ns
    Second    5.347 us   95% CI 5.299 us - 5.396 us

// Size on a trivially collection modified in known ways
// Note change of units, over 300x slower!!
scala> th.pbenchOff(){ v.dropWhile(_ < 1).size }{ sv.dropWhile(_ < 1).size }
Benchmark comparison (in 391.9 ms)
Significantly different (p ~= 0)
  Time ratio:    301.38928   95% CI 271.63507 - 331.14349   (n=20)
    First     24.99 ns   95% CI 22.52 ns - 27.45 ns
    Second    7.531 us   95% CI 7.492 us - 7.571 us

// Even more trivial change!
// Note change of units--over nine hundred times slower!!!
scala> th.pbenchOff(){ v.drop(1).size }{ sv.drop(1).size }
Benchmark comparison (in 441.9 ms)
Significantly different (p ~= 0)
  Time ratio:    909.92073   95% CI 453.72843 - 1366.11302   (n=20)
    First     8.617 ns   95% CI 4.297 ns - 12.94 ns
    Second    7.841 us   95% CI 7.771 us - 7.910 us

// Sum of small slice--cannot be accelerated without late entry as well as early exit
// Note change of units--old version is 33x faster!
scala> th.pbenchOff(){ v.slice(500,510).sum }{ sv.slice(500, 510).sum }
Benchmark comparison (in 370.2 ms)
Significantly different (p ~= 0)
  Time ratio:    32.92864   95% CI 31.93851 - 33.91878   (n=20)
    First     89.19 ns   95% CI 86.59 ns - 91.80 ns
    Second    2.937 us   95% CI 2.916 us - 2.958 us

The new views have really fast filtering and collection creation, both in simple and compound cases:

// Bare filter
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).sum }{ sv.filter(_ % 3 != 0).sum }
Benchmark comparison (in 633.1 ms)
Significantly different (p ~= 0)
  Time ratio:    0.60164   95% CI 0.58747 - 0.61581   (n=20)
    First     18.25 us   95% CI 17.92 us - 18.58 us
    Second    10.98 us   95% CI 10.82 us - 11.15 us

// Compound filter
scala> th.pbenchOff(){ v.filter(_ % 3 != 0).filter(_ % 5 != 0).filter(_ % 2 != 0).sum }{ sv.filter(_ % 3 != 0).filter(_ % 5 != 0).filter(_ % 2 != 0).sum }
Benchmark comparison (in 496.4 ms)
Significantly different (p ~= 0)
  Time ratio:    0.51228   95% CI 0.48514 - 0.53942   (n=20)
    First     39.86 us   95% CI 38.34 us - 41.37 us
    Second    20.42 us   95% CI 19.66 us - 21.17 us

// List creation
scala> th.pbenchOff(){ v.to[List] }{ sv.to[List] }
Benchmark comparison (in 665.2 ms)
Significantly different (p ~= 0)
  Time ratio:    0.40543   95% CI 0.38705 - 0.42380   (n=20)
    First     21.06 us   95% CI 20.25 us - 21.88 us
    Second    8.540 us   95% CI 8.337 us - 8.743 us

So overall, it's definitely mixed. It is not for naught that old views were subclassed; it does make a massive difference in some cases. (Note that there is no particular reason you can't create a staged view of a slice, so only cases where you don't know in advance how much to slice will be most seriously affected.)

If we're willing to give up the shortcuts and speedups we get with direct indexing, the new version has fewer bugs and in some cases better performance (especially with filter). But the old version as a TraversableView only would have fewer bugs also. It was always the subclassing that introduced the most bugs!

Overall, I can't really see a highly compelling raw performance benefit. It's probably better for GC (I haven't measured carefully). If these views are to replace the old ones, I think the justification must mostly come from aspects other than performance. (Of course, if you can take advantage of the wins and aren't hurt by the drawbacks, it will seem like quite a nice upgrade.)

non commented 9 years ago

For everyone following along, here is the link to the "new views": https://github.com/jsuereth/viewducers

non commented 9 years ago

@Ichoran Do you have the benchmarks you were running online? I was interested in comparing viewducers to relatively naive uses of .iterator. It might be nice to add them to the project.

Ichoran commented 9 years ago

@non - The above is just a transcript of my REPL session! You just need Thyme and Viewducers. (Just drop the Thyme jar in lib, I guess, and use sbt console.) All you'd need to do differently is to add def it = source.iterator, I guess. Oh, and I had -Xmx4G or something; there was too much GC churn at the default to get stable results.

SethTisue commented 9 years ago

@jsuereth has now opened a PR at https://github.com/scala/slip/pull/17, so, closing this. (of course, that doesn't mean other proposals aren't welcome also)

jsuereth commented 9 years ago

@Ichoran Yeah, I understand about the random improvements in subclasses. I don't think we have to drop all that perf, but I would like to FIRST drop all the bugs, then see if we can gain back the performance.

SO far, I haven't been able to make subclass-specific optimisations work in viewducers prototype without sacrificing the performance gains in the detection. Ideals welcome there.

jsuereth commented 9 years ago

Also, @Ichoran You should try operatoins in slice examples that DO NOT preserve the shape of the collection (e.g. flatMap, dropWhile, etc.).

Ichoran commented 9 years ago

@jsuereth - flatMap sometimes shows an improvement, sometimes not, depending on what I do afterwards. collect is often a little better. dropWhile does preserve the shape and is dramatically faster the old way if you're not dropping much. E.g.

scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).size }{ sv.flatMap(i => Seq(i, -i)).size }
Benchmark comparison (in 678.1 ms)
Significantly different (p ~= 1.007e-10)
  Time ratio:    1.03904   95% CI 1.02996 - 1.04812   (n=20)
    First     108.2 us   95% CI 107.6 us - 108.9 us
    Second    112.5 us   95% CI 111.8 us - 113.2 us

scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).sum }{ sv.flatMap(i => Seq(i, -i)).sum }
Benchmark comparison (in 709.6 ms)
Significantly different (p ~= 1.081e-04)
  Time ratio:    0.97033   95% CI 0.95673 - 0.98394   (n=20)
    First     121.2 us   95% CI 120.0 us - 122.3 us
    Second    117.6 us   95% CI 116.4 us - 118.7 us

scala> th.pbenchOff(){ v.collect{ case x if x % 2 == 0 => x >> 1}.sum }{ sv.collect{ case x if x % 2 == 0 => x >> 1 }.sum }
Benchmark comparison (in 658.6 ms)
Significantly different (p ~= 4.857e-11)
  Time ratio:    0.95801   95% CI 0.94889 - 0.96713   (n=20)
    First     13.47 us   95% CI 13.38 us - 13.56 us
    Second    12.90 us   95% CI 12.82 us - 12.99 us

scala> th.pbenchOff(){ v.dropWhile(_ < 20).size }{ sv.dropWhile(_ < 20).size }
Benchmark comparison (in 747.2 ms)
Significantly different (p ~= 0)
  Time ratio:    73.39225   95% CI 70.64271 - 76.14179   (n=30)
    First     101.0 ns   95% CI 97.26 ns - 104.6 ns
    Second    7.409 us   95% CI 7.350 us - 7.468 us

I finally found where the really big wins come from: using non-indexable collections like List:

scala> th.pbenchOff(){ v.dropWhile(_ < 20).sum }{ sv.dropWhile(_ < 20).sum }
Benchmark comparison (in 647.5 ms)
Significantly different (p ~= 0)
  Time ratio:    0.49890   95% CI 0.49411 - 0.50368   (n=20)
    First     13.77 us   95% CI 13.68 us - 13.86 us
    Second    6.869 us   95% CI 6.823 us - 6.916 us

scala> th.pbenchOff(){ v.flatMap(i => Seq(i, -i)).size }{ sv.flatMap(i => Seq(i, -i)).size }
Benchmark comparison (in 634.7 ms)
Significantly different (p ~= 0)
  Time ratio:    0.13020   95% CI 0.12904 - 0.13137   (n=20)
    First     868.4 us   95% CI 862.8 us - 874.0 us
    Second    113.1 us   95% CI 112.4 us - 113.8 us

In light of this, I think the performance tale is a little more compelling than I was thinking before.

Shall I just leave comments here? They're not actually on the SIP page.