Closed mouse0w0 closed 4 months ago
Oh well. I try to cache locally whenever possible. If you see places where I didn't, it is because I forgot it. A PR would be welcome!
Additionally, I have noticed that iterating over the array in the forward direction is faster and more readable compared to iterating in reverse order. I believe this could be considered as an optimization, and I am willing to submit a pull request to modify the reverse iteration in fastutil if necessary. Looking forward to your guidance.
fastutil has been written > 20y ago, and at that time a series of compilation optimizations made reserve loop faster. I think things change and it's possible current JVM have better compilation. Do you have any benchmarks for, say, Arrays.fill()? And, what is java.util.Arrays.fill() doing today?
In the benchmark of filling array, reverse filling is twice as fast as forward filling in Java 8, with no significant differences observed in Java 17.
Benchmark code:
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 3)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.SECONDS)
public class ArrayFillBenchmark {
private int[] a;
@Setup
public void setup() {
a = new int[1000000];
}
@Benchmark
public void fill() {
Arrays.fill(a, 1);
}
@Benchmark
public void forward() {
int[] a = this.a;
for (int i = 0, len = a.length; i < len; i++) {
a[i] = 1;
}
}
@Benchmark
public void reverse() {
int[] a = this.a;
for (int i = a.length - 1; i >= 0; i--) {
a[i] = 1;
}
}
@Benchmark
public void reverse2() {
int[] a = this.a;
for (int i = a.length; i-- != 0;) {
a[i] = 1;
}
}
}
Benchmark result:
JDK 1.8.0_311
Benchmark Mode Cnt Score Error Units
ArrayFillBenchmark.fill thrpt 10 7719.026 ± 57.793 ops/s
ArrayFillBenchmark.forward thrpt 10 7446.591 ± 355.939 ops/s
ArrayFillBenchmark.reverse thrpt 10 11386.325 ± 416.348 ops/s
ArrayFillBenchmark.reverse2 thrpt 10 11604.816 ± 151.888 ops/s
JDK 17.0.7
Benchmark Mode Cnt Score Error Units
ArrayFillBenchmark.fill thrpt 10 11792.673 ± 191.212 ops/s
ArrayFillBenchmark.forward thrpt 10 11510.398 ± 449.012 ops/s
ArrayFillBenchmark.reverse thrpt 10 11493.141 ± 243.214 ops/s
ArrayFillBenchmark.reverse2 thrpt 10 11593.859 ± 150.707 ops/s
fastutil has been written > 20y ago, and at that time a series of compilation optimizations made reserve loop faster. I think things change and it's possible current JVM have better compilation. Do you have any benchmarks for, say, Arrays.fill()? And, what is java.util.Arrays.fill() doing today?
Arrays.fill, Arrays.copyOf, System.arraycopy etc. are AFAIK (a tiny bit) faster than their "naive" counterparts. Concretely System.arraycopy
is a (more or less) direct native call I think and Arrays.copyOf delegates to that. Plus these methods can use vectorized instructions.
Based on the conclusion in pull request #316, running benchmark on JDK 17.0.10
. Benchmark result:
# JMH version: 1.37
# VM version: JDK 17.0.10, Java HotSpot(TM) 64-Bit Server VM, 17.0.10+11-LTS-240
# VM invoker: C:\Program Files\jdk-17\bin\java.exe
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 2 iterations, 10 s each
# Measurement: 2 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
Benchmark Mode Cnt Score Error Units
ArrayIterationBenchmark.foreach thrpt 2 7329.832 ops/s
ArrayIterationBenchmark.forward thrpt 2 7329.824 ops/s
ArrayIterationBenchmark.forwardWithStoreArray thrpt 2 7322.617 ops/s
ArrayIterationBenchmark.forwardWithStoreLength thrpt 2 7263.240 ops/s
ArrayIterationBenchmark.lambda thrpt 2 7233.075 ops/s
ArrayIterationBenchmark.reversed thrpt 2 6325.382 ops/s
ArrayIterationBenchmark.reversed2 thrpt 2 6531.927 ops/s
ArrayIterationBenchmark.reversed2WithStoreArray thrpt 2 6507.797 ops/s
ArrayIterationBenchmark.reversedWithStoreArray thrpt 2 6535.968 ops/s
According to the benchmark results, iterating arrays in forward yields a 10% higher throughput compared to iterating arrays in reverse.
Whether caching fields or not, it will not affect performance.
In a benchmark focused on array iteration, I observed a substantial performance improvement of nearly four times when using foreach or storing the array in a local variable. I noticed that in certain array iteration scenarios within fastutil, arrays are not stored in local variables (e.g.,
ArrayList.drv
,ArrayMap.drv
). I'm uncertain whether this should be considered as an issue. Below is my benchmark code:Benchmark result:
I am seeking guidance on whether the decision not to store arrays in local variables is intentional or if it might be worth considering as an optimization opportunity. Your insights into this matter would be greatly appreciated.