rorygraves / scalac_perf

The Scala programming language
http://www.scala-lang.org/
16 stars 3 forks source link

List should use eq Nil instead of isEmpty for end of list detection #52

Open mkeskells opened 6 years ago

mkeskells commented 6 years ago
    var these = this
    while (!these.isEmpty) {
      f(these.head)
      these = these.tail
    }
  }

could be

    var these = this
    while (these ne Nil) {
      f(these.head)
      these = these.tail
    }
  }
mkeskells commented 6 years ago

also MurmurHash3


  final def listHash(xs: scala.collection.immutable.List[_], seed: Int): Int = {
    var n = 0
    var h = seed
    var elems = xs
    while (!elems.isEmpty) {
      val head = elems.head
      val tail = elems.tail
      h = mix(h, head.##)
      n += 1
      elems = tail
    }
    finalizeHash(h, n)
  }
feyman2016 commented 6 years ago

@mkeskells I wrote a micro benchmark to compare the performance of list eq Nil and list.isEmpty , PR here , and the benchmark results is just as follows: There seems no obvious difference between their performance, I will further investigate their bytecode to understand the mechanism underneath. 😄


scala.collection.immutable.List.length
Eq Nil

[info] Benchmark                  (size)  Mode  Cnt     Score    Error  Units
[info] ListBenchmark.call_length       0  avgt   20     3.953 ±  0.221  ns/op
[info] ListBenchmark.call_length       1  avgt   20     6.838 ±  1.113  ns/op
[info] ListBenchmark.call_length      10  avgt   20    17.357 ±  1.756  ns/op
[info] ListBenchmark.call_length     100  avgt   20   216.360 ±  2.863  ns/op
[info] ListBenchmark.call_length    1000  avgt   20  2432.341 ± 60.765  ns/op

.isEmpty

[info] Benchmark                  (size)  Mode  Cnt     Score    Error  Units
[info] ListBenchmark.call_length       0  avgt   20     3.568 ±  0.114  ns/op
[info] ListBenchmark.call_length       1  avgt   20     4.301 ±  0.482  ns/op
[info] ListBenchmark.call_length      10  avgt   20    14.973 ±  0.353  ns/op
[info] ListBenchmark.call_length     100  avgt   20   208.602 ±  2.478  ns/op
[info] ListBenchmark.call_length    1000  avgt   20  2310.960 ± 24.554  ns/op

scala.collection.immutable.List.lengthCompare
eq Nil
[info] Benchmark                         (size)  Mode  Cnt     Score    Error  Units
[info] ListBenchmark.call_lengthCompare       0  avgt   20     5.678 ±  3.099  ns/op
[info] ListBenchmark.call_lengthCompare       1  avgt   20     4.057 ±  0.207  ns/op
[info] ListBenchmark.call_lengthCompare      10  avgt   20     8.789 ±  0.324  ns/op
[info] ListBenchmark.call_lengthCompare     100  avgt   20    97.562 ±  8.604  ns/op
[info] ListBenchmark.call_lengthCompare    1000  avgt   20  1012.257 ± 20.556  ns/op

.isEmpty
[info] Benchmark                         (size)  Mode  Cnt    Score    Error  Units
[info] ListBenchmark.call_lengthCompare       0  avgt   20    4.135 ±  0.111  ns/op
[info] ListBenchmark.call_lengthCompare       1  avgt   20    4.038 ±  0.088  ns/op
[info] ListBenchmark.call_lengthCompare      10  avgt   20    9.278 ±  2.062  ns/op
[info] ListBenchmark.call_lengthCompare     100  avgt   20   90.333 ±  2.914  ns/op
[info] ListBenchmark.call_lengthCompare    1000  avgt   20  999.997 ± 20.171  ns/op

scala.collection.immutable.List.dropwhile
Eq Nil
[info] Benchmark                     (size)  Mode  Cnt     Score     Error  Units
[info] ListBenchmark.call_dropWhile       0  avgt   20     3.567 ±   0.298  ns/op
[info] ListBenchmark.call_dropWhile       1  avgt   20     5.949 ±   0.970  ns/op
[info] ListBenchmark.call_dropWhile      10  avgt   20    21.641 ±   0.801  ns/op
[info] ListBenchmark.call_dropWhile     100  avgt   20   248.440 ±   6.461  ns/op
[info] ListBenchmark.call_dropWhile    1000  avgt   20  2805.942 ± 500.897  ns/op

.isEmpty
[info] Benchmark                     (size)  Mode  Cnt     Score     Error  Units
[info] ListBenchmark.call_dropWhile       0  avgt   20     3.620 ±   0.359  ns/op
[info] ListBenchmark.call_dropWhile       1  avgt   20     4.638 ±   0.049  ns/op
[info] ListBenchmark.call_dropWhile      10  avgt   20    24.545 ±  14.729  ns/op
[info] ListBenchmark.call_dropWhile     100  avgt   20   340.365 ± 138.329  ns/op
[info] ListBenchmark.call_dropWhile    1000  avgt   20  2704.003 ± 406.652  ns/op

scala.util.hashing.MurmurHash3.listHash

Eq Nil
[info] Benchmark                           (size)  Mode  Cnt      Score     Error  Units
[info] MurmurHash3Benchmark.call_listHash       0  avgt   20      6.428 ±   0.798  ns/op
[info] MurmurHash3Benchmark.call_listHash       1  avgt   20     19.277 ±   2.439  ns/op
[info] MurmurHash3Benchmark.call_listHash      10  avgt   20    125.612 ±  13.692  ns/op
[info] MurmurHash3Benchmark.call_listHash     100  avgt   20   1152.770 ± 107.546  ns/op
[info] MurmurHash3Benchmark.call_listHash    1000  avgt   20  13605.814 ± 608.183  ns/op

.isEmpty
[info] Benchmark                           (size)  Mode  Cnt      Score     Error  Units
[info] MurmurHash3Benchmark.call_listHash       0  avgt   20      6.052 ±   0.208  ns/op
[info] MurmurHash3Benchmark.call_listHash       1  avgt   20     11.668 ±   0.139  ns/op
[info] MurmurHash3Benchmark.call_listHash      10  avgt   20    112.381 ±   7.618  ns/op
[info] MurmurHash3Benchmark.call_listHash     100  avgt   20   1056.814 ±  32.572  ns/op
[info] MurmurHash3Benchmark.call_listHash    1000  avgt   20  12452.243 ± 459.762  ns/op
rorygraves commented 6 years ago

@feyman2016 I agree - this is suprising - I would be very interested into finding out what the bytecode looks like.

feyman2016 commented 6 years ago

@rorygraves @mkeskells I investigated the scala.util.hashing.MurmurHash3#listHash as follows:

final def listHash(xs: scala.collection.immutable.List[_], seed: Int): Int = {
  var n = 0
  var h = seed
  var elems = xs
  while (!elems.isEmpty) {
    val head = elems.head
    val tail = elems.tail
    h = mix(h, head.##)
    n += 1
    elems = tail
  }
  finalizeHash(h, n)
}

If we use while (!elems.isEmpty) { , we wil get the bytecode like:

   L6
    LINENUMBER 166 L6
   FRAME APPEND [I I scala/collection/immutable/List]
    ALOAD 5
    INVOKEVIRTUAL scala/collection/immutable/List.isEmpty ()Z
    IFNE L7

or if we use while (elems ne Nil) {, we will get :

   L6
    LINENUMBER 166 L6
   FRAME APPEND [I I scala/collection/immutable/List]
    ALOAD 5
    GETSTATIC scala/collection/immutable/Nil$.MODULE$ : Lscala/collection/immutable/Nil$;
    IF_ACMPEQ L7

These bytecode of both are just like what I thought they would be, which makes benchmark results even more difficult to explain.

mkeskells commented 6 years ago

Is it that the jit cannot determine that Nil is a constant

Can you try with a java class

Class holder{ final static List NIL = Nil }
mkeskells commented 6 years ago

What you can't see from the bytecode is what the jit does with this and the information that it deduces. If the jit knows it's a constant then it may get inlined. If the git cant do that then it's a static field access To complicate further java 9 seems to do a better job at this sometimes and particularly for invokedynamic stuff (which we don't have here I think)

mkeskells commented 6 years ago

That's why I suggest comparing to idiomatic java code

feyman2016 commented 6 years ago

I tried to find some clue from the machine code of scala.collection.immutable.List.length, results as below(line 339 is where we use eq Nil or isEmpty):

Eq Nil (bench = 100000)

   0x000000010c5deece: movabs $0x76b6701e0,%r10  ;   {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info]   0x000000010c5deed8: mov    0x68(%r10),%r11d   ;*getstatic MODULE$
[info]                                                 ; - scala.collection.immutable.List::length@5 (line 339)
[info] 
[info]   0x000000010c5deedc: mov    %r11,%r10
[info]   0x000000010c5deedf: shl    $0x3,%r10
[info]   0x000000010c5deee3: cmp    %r10,%rsi
[info]   0x000000010c5deee6: je     0x000000010c5def15  ;*if_acmpeq
[info]                                                 ; - scala.collection.immutable.List::length@8 (line 339)

[info]   0x000000010fb38387: movabs $0x76b6701e8,%r10  ;   {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info]   0x000000010fb38391: jmp    0x000000010fb383ae
[info]   0x000000010fb38393: nopw   0x0(%rax,%rax,1)
[info]   0x000000010fb3839c: data16 data16 xchg %ax,%ax

[info]   0x000000010fb389a3: movabs $0x76b6701e8,%rbx  ;   {oop(a 'java/lang/Class' = 'scala/collection/immutable/Nil$')}
[info]   0x000000010fb389ad: mov    0x68(%rbx),%ebx
[info]   0x000000010fb389b0: shl    $0x3,%rbx          ;*getstatic MODULE$
[info]                                                 ; - scala.collection.immutable.List::length@5 (line 339)
[info] 
[info]   0x000000010fb389b4: cmp    %rbx,%rsi
[info]   0x000000010fb389b7: movabs $0x10b70eaf0,%rbx  ;   {metadata(method data for {method} {0x000000010b675b88} 'length' '()I' in 'scala/collection/immutable/List')}
[info]   0x000000010fb389c1: movabs $0x108,%rax
[info]   0x000000010fb389cb: je     0x000000010fb389db
[info]   0x000000010fb389d1: movabs $0x118,%rax
[info]   0x000000010fb389db: mov    (%rbx,%rax,1),%rdx
[info]   0x000000010fb389df: lea    0x1(%rdx),%rdx
[info]   0x000000010fb389e3: mov    %rdx,(%rbx,%rax,1)
[info]   0x000000010fb389e7: jne    0x000000010fb387c8  ;*if_acmpeq
[info]                                                 ; - scala.collection.immutable.List::length@8 (line 339)

.isEmpty (bench = 100000)

[info]   0x0000000109c45075: mov    0x10(%rsi),%r8d    ; OopMap{r8=NarrowOop off=121}
[info]                                                 ;*goto
[info]                                                 ; - scala.collection.immutable.List::length@23 (line 339)
[info] 
[info]   0x0000000109c45079: test   %eax,-0x22c007f(%rip)        # 0x0000000107985000
[info]                                                 ;*goto
[info]                                                 ; - scala.collection.immutable.List::length@23 (line 339)
[info]                                                 ;   {poll}
[info]   0x0000000109c4507f: mov    0x8(%r12,%r8,8),%r9d  ; implicit exception: dispatches to 0x0000000109c450f1
[info]   0x0000000109c45084: cmp    $0xf801ba85,%r9d   ;   {metadata('scala/collection/immutable/$colon$colon')}
[info]   0x0000000109c4508b: je     0x0000000109c45060
[info]   0x0000000109c4508d: cmp    $0xf801903c,%r9d   ;   {metadata('scala/collection/immutable/Nil$')}
[info]   0x0000000109c45094: jne    0x0000000109c450c6  ;*ifne
[info]                                                 ; - scala.collection.immutable.List::length@8 (line 339)

[info]   0x0000000109c45060: lea    (%r12,%r8,8),%rsi  ;*invokevirtual isEmpty
[info]                                                 ; - scala.collection.immutable.List::length@5 (line 339)
...
...
[info]   0x0000000109c45060: lea    (%r12,%r8,8),%rsi  ;*invokevirtual isEmpty
[info]                                                 ; - scala.collection.immutable.List::length@5 (line 339)

...
...
[info]   0x0000000109c450c6: mov    $0xffffffc6,%esi
[info]   0x0000000109c450cb: mov    %eax,(%rsp)
[info]   0x0000000109c450ce: mov    %r8d,0x4(%rsp)
[info]   0x0000000109c450d3: callq  0x0000000109a051a0  ; OopMap{[4]=NarrowOop off=216}
[info]                                                 ;*invokevirtual isEmpty
[info]                                                 ; - scala.collection.immutable.List::length@5 (line 339)
[info]                                                 ;   {runtime_call}

Is it because .isEmpty is override in both Nil and :: , so calling .isEmpty would be simply equal to comparison the list to Nil , instead of call .isEmpty in the super type ? @mkeskells

mkeskells commented 6 years ago

I would look at a little benchmark - 3 method that does

  1. isEmpty
  2. eq Nil
  3. java based == a java final static field containing Nil
  4. maybe try isInstanceOf[Nil]
  5. .getClass eq classOf[Nil]
  6. anything else that you thin may be faster

test for performance, and look at the code generated. We are looking for the fastest!

szeiger commented 6 years ago

I wonder why the MODULE$ fields aren't static final in the first place. Shouldn't it be OK to make it final when it's initialized in a static initializer?

mkeskells commented 6 years ago

@szeiger https://github.com/rorygraves/scalac_perf/issues/19 relates - @retronym did some cases in https://github.com/scala/scala/pull/6523, but I don't think that can be applied to deeper hierarchies without more work. There is some discussion in that ticket

The problem is that a parent ctor can refer to the MODULE$ directly or indirectly, and it cant be null, or we would endlessly recurse