Closed LifeIsStrange closed 5 months ago
@LifeIsStrange thanks for reporting the issue @dougxc Please have a look and advise
We will look into this. Thanks for pointing it out. Note that the default GC changed between 8 and 11 which may explain some of the difference. There are also of course many other changes between 8 and 11 (e.g. runtime, library changes) which may explain the differences so I think characterizing this as a strong performance regression is not quite right.
@plokhotnyuk would it be possible to take one or two of the more serious regressions and see if they reproduce while using -XX:UseSerialGC
on 11? That would at least help rule out or pinpoint one obvious candidate for the difference.
That graph presentation is very busy without a lot of controls for display. If you clone those pages with wget you can tweak the underlying data to get pairwise comparisons which look much better. I used
wget --mirror --convert-links --page-requisites --no-parent -P . https://plokhotnyuk.github.io/jsoniter-scala/
and then if you edit plokhotnyuk.github.io/jsoniter-scala/provided.js
you can change the value of providedBenchmarks
to reorder or restrict which JDKs you see. Specifying only 2 results give you a nicer bar char which shows relative performance. Like this for example:
You can also get these nice pairwise summaries. Within the same release EE looks very good relative to Coretto.
There are some cases of major regressions that might deserve some investigation. Comparing Corretto 8 and 11 shows a fair number of regressions suggesting that general JDK changes are having negative impacts on the benchmarks.
Doing the same comparison within EE suggests that a lot of cases where we used to win in 8 are no longer so good with 11.
It seems like looking at major regressions between EE 8 and 11 that aren't also Corretto regressions might be the place to start. A little scripting could probably extract a more readable table that shows the potentially interesting ones.
The performance difference between Java 8 and Java 11 is likely due to the String.getBytes
call at com.github.plokhotnyuk.jsoniter_scala.core.JsonWriter.writeNonEscapedAsciiKey (JsonWriter.scala:144) [bci:107]
.
On Java 8, String.getBytes
copies the bytes using a while loop:
while (i < n) {
dst[j++] = (byte)val[i++];
}
which will be fully unrolled given that this call in the application code always asks for 4 bytes.
On Java 11, String.getBytes
redirects to StringLatin1.getBytes
, which uses System.arraycopy
instead and calls into runtime stub, and consequently much slower.
Great find! Is there a way to fix that in openjdk/graalvm ? Maybe that @cl4es would be interested to take a look as he was recently working on optimizing openjdk string/charset performance (cf: https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html )
@plokhotnyuk would it be possible to take one or two of the more serious regressions and see if they reproduce while using
-XX:UseSerialGC
on 11? That would at least help rule out or pinpoint one obvious candidate for the difference.
On all JVMs the -XX:+UseParallelGC
options was turned on. Full list of JVM options that are set for all JVMs is here.
@mur47x111 Hi, I was wondering if you could take another look at this. Your String.getBytes hypothesis is a great one but it has been intrinsified to AVX in JDK 18 https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html and already was somewhat optimized in JDK 17? If so then it's strange because yes JDK 17 is almost as fast as JDK 8 but still, had the JDK 8 to 11 performance regression been totally fixed, JDK performance should probably not be so stagnant, considering it received many optimizations in all those releases so I theorize either there were another regression in JDK 8 that still impacts JDK 17, either the .getBytes regression is only sufficiently fixed in JDK 18? (we don't have updated jsoniter Scala benchmarks for that yet)
Note BTW that even arraycopy might be intrisified nowadays https://github.com/openjdk/jdk/pull/61 But it's unclear to le wether there is an AVX 256 version or wether it's only for AVX 512 CPUs
Thanks for the info. I will take a look and update later.
@LifeIsStrange I have merged a bunch of intrinsics including some charset-related ones https://github.com/oracle/graal/commit/47d1fb1556360c68907324e8b9f06b1651f5d1f1 . Could you please test if it would help addressing this regression?
I cannot but maybe that @plokhotnyuk can
Please re-open if the performance regression is still observable.
As can be seen on the benchmarks of the fastest serialization library on the jvm: https://plokhotnyuk.github.io/jsoniter-scala/