For ArrayBuffer#min and max, Scala 2.13 is slower than 2.12

LuciferYang commented 2 years ago

Reproduction steps

Scala version: 2.13.8

The test program is as follows

private def min(numIters: Int, bufferSize: Int, loops: Int): Unit = {
    val benchmark = new Benchmark("Array Buffer", loops, output = output)
    val buffer = new mutable.ArrayBuffer[Int]()
    (0 until bufferSize).foreach(i => buffer += i)
    benchmark.addCase(s"buffer min with $bufferSize items", numIters) { _ =>
      (0 until loops).foreach(_ => buffer.min)
    }
    benchmark.addCase(s"buffer max with $bufferSize items", numIters) { _ =>
      (0 until loops).foreach(_ => buffer.max)
    }
    benchmark.run()
  }

  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val numIters = 3
    val loops = 20000
    runBenchmark("ArrayBuffer min and max") {
      Seq(1000, 10000, 100000).foreach { bufferSize =>
        min(numIters, bufferSize, loops)
      }
    }
  }

The Benchmark of spark is used to record time, which can also be changed to manual timing.

In above cases, I test 2 operations: ArrayBuffer.min, ArrayBuffer.max, with different buffer size: 1000, 10000, 100000.

Scala 2.12.16 result:

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 1000 items                          121            121           0          0.2        6043.9       1.0X
buffer max with 1000 items                          134            134           0          0.1        6705.5       0.9X

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 10000 items                        1272           1272           0          0.0       63601.9       1.0X
buffer max with 10000 items                        1339           1340           0          0.0       66972.2       0.9X

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 100000 items                      12785          12788           3          0.0      639232.4       1.0X
buffer max with 100000 items                      13433          13434           1          0.0      671654.2       1.0X

Scala 2.13.8 result:

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 1000 items                          273            273           0          0.1       13646.5       1.0X
buffer max with 1000 items                          162            162           0          0.1        8076.9       1.7X

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 10000 items                        1676           1676           0          0.0       83782.1       1.0X
buffer max with 10000 items                        1610           1610           0          0.0       80485.5       1.0X

OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Array Buffer:                             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
buffer min with 100000 items                      16849          16856          10          0.0      842469.3       1.0X
buffer max with 100000 items                      15513          15515           4          0.0      775641.0       1.1X

Problem

From the test results, Scala 2.13.8 is slower than Scala 2.12.16 for ArrayBuffer.min and ArrayBuffer.max. Should they have the same performance? Or should I use other data structures in Scala 2.13?

LuciferYang commented 2 years ago

cc @SethTisue for help ~

Jasper-M commented 2 years ago

Looks like 2.12 has an optimized implementation of reduceLeft for indexed seqs.

LuciferYang commented 2 years ago

Could we migrate the relevant optimization to 2.13.x ?

SethTisue commented 2 years ago

I believe a PR on that would be accepted.

fyi @scala/collections

som-snytt commented 2 years ago

IndexedSeq has a foldRight that goes through the reverseIterator view (which uses indexed access and counts down).

There is nothing comparable to the 2.12 IndexedSeqOptimized which has "tight loops" for these operations.

I don't mind spending some time with this issue as a learning experience, but if someone better equipped has time because of a July 4 holiday, or a July 14 holiday, etc, that would be fine.

som-snytt commented 2 years ago

Before and after overriding reduceLeft in ArrayBuffer itself, I don't know why the before isn't linear in the small sizes

[info] Benchmark                                                 (size)  Mode  Cnt      Score     Error  Units
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction      10  avgt   30     26.010 ±   1.332  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction     100  avgt   30    141.075 ±   2.798  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction    1000  avgt   30   6955.457 ±  32.765  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction   10000  avgt   30  69772.641 ± 136.585  ns/op

[info] Benchmark                                                 (size)  Mode  Cnt      Score     Error  Units
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction      10  avgt   30     39.154 ±   0.459  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction     100  avgt   30    420.462 ±   3.671  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction    1000  avgt   30   3971.257 ±   2.580  ns/op
[info] ArrayBufferBenchmark.min$minusmax$u0020is$u0020reduction   10000  avgt   30  39887.655 ± 176.183  ns/op

Presumably there will be smaller gains after moving the override to a trait. (30% here vs 20% in OP.)

A comment suggests at small size, it witnesses whether JIT kicked in.

som-snytt commented 2 years ago

immutable.ArraySeq has overrides, or foldLeft but not reduceLeft. Also, ArrayBuffer benefits an extra few % by direct array access, so it's worth it. People keep saying don't use array, use ArrayBuffer.

Oh, the reason my local 2.13.2 scaladoc doesn't show the override in ArraySeq is that japgolly added it during pandemic. Golly.

scala / bug

For ArrayBuffer#min and max, Scala 2.13 is slower than 2.12 #12614

Reproduction steps

Problem