Closed franz1981 closed 5 months ago
Looks great!
@Sanne I've prepared a microbenchmark for this
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
@Warmup(iterations = 10, time = 400, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 5, time = 200, timeUnit = TimeUnit.MILLISECONDS)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(2)
public class IndentityLookup {
private SingleWriterCopyOnWriteArrayIdentityMap<ClassLoader, Object> arrayMap;
private Map<ClassLoader, Object> hashMap;
@Param({ "2", "10"})
public int size;
private ClassLoader missingClassLoader;
private ClassLoader firstClassLoader;
private ClassLoader lastClassLoader;
@Setup
public void init() {
arrayMap = new SingleWriterCopyOnWriteArrayIdentityMap<>();
hashMap = new HashMap<>();
// create N different instances of ClassLoader
missingClassLoader = new ClassLoader() { };
for (int i = 0; i < size; i++) {
var classLoader = new ClassLoader() { };
if (i == 0) {
firstClassLoader = classLoader;
}
if (i == size - 1) {
lastClassLoader = classLoader;
}
arrayMap.put(classLoader, new Object());
hashMap.put(classLoader, new Object());
}
}
@Benchmark
public Object arrayMapGetMissing() {
return arrayMap.get(missingClassLoader);
}
@Benchmark
public Object hashMapGetMissing() {
return hashMap.get(missingClassLoader);
}
@Benchmark
public Object arrayMapGetFirst() {
return arrayMap.get(firstClassLoader);
}
@Benchmark
public Object hashMapGetFirst() {
return hashMap.get(firstClassLoader);
}
@Benchmark
public Object arrayMapGetLast() {
return arrayMap.get(lastClassLoader);
}
@Benchmark
public Object hashMapGetLast() {
return hashMap.get(lastClassLoader);
}
}
These are the results
Benchmark (size) Mode Cnt Score Error Units
IndentityLookup.arrayMapGetFirst 2 thrpt 10 1436.097 ± 33.765 ops/us
IndentityLookup.arrayMapGetFirst 10 thrpt 10 1440.117 ± 37.064 ops/us
IndentityLookup.arrayMapGetLast 2 thrpt 10 666.662 ± 3.000 ops/us
IndentityLookup.arrayMapGetLast 10 thrpt 10 332.156 ± 5.896 ops/us
IndentityLookup.arrayMapGetMissing 2 thrpt 10 640.518 ± 7.431 ops/us
IndentityLookup.arrayMapGetMissing 10 thrpt 10 301.235 ± 2.458 ops/us
IndentityLookup.hashMapGetFirst 2 thrpt 10 651.625 ± 5.688 ops/us
IndentityLookup.hashMapGetFirst 10 thrpt 10 647.957 ± 6.822 ops/us
IndentityLookup.hashMapGetLast 2 thrpt 10 652.398 ± 7.165 ops/us
IndentityLookup.hashMapGetLast 10 thrpt 10 652.113 ± 1.399 ops/us
IndentityLookup.hashMapGetMissing 2 thrpt 10 865.340 ± 8.298 ops/us
IndentityLookup.hashMapGetMissing 10 thrpt 10 870.814 ± 6.873 ops/us
The worst case scenario (if we ignore, the missing case, which shouldn't happen, right @FroMage ?) is
*GetLast
and for such, with 2 class loaders, the performance vs HashMap
is actually the same.
With 10 class loaders, instead, the array map is even slower (!!!).
I've yet to analyze the results, but as it is now I'm putting this in draft
mmm I'm not very happy, still, because my benchmark seems to not trigger the same behavior as the real code
which is very different from the real one
that's hitting https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/hotspot/share/prims/jvm.cpp#L611
Still investigating what's going on. -prof perfasm
seems to agree that the hash code used is different: it seems related the HashMap
s hash
method which maybe have its type polluted; let's try differently...
In short, it could:
@Sanne This is FUN
look at the benchmark at: https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/concurrent/IndentityLookup.java
these are the numbers, now fixed to match the actual behavior in the benchmark (and MAYBE is what you can observe in hibernate too (!!!!) - eg )
Benchmark (disableHashCodeIntrinsics) (size) Mode Cnt Score Error Units
IndentityLookup.arrayMapGetFirst true 2 thrpt 10 1453.559 ± 15.492 ops/us
IndentityLookup.arrayMapGetFirst true 10 thrpt 10 1485.773 ± 18.988 ops/us
IndentityLookup.arrayMapGetLast true 2 thrpt 10 687.806 ± 29.120 ops/us
IndentityLookup.arrayMapGetLast true 10 thrpt 10 337.043 ± 5.881 ops/us
IndentityLookup.arrayMapGetMissing true 2 thrpt 10 652.220 ± 7.062 ops/us
IndentityLookup.arrayMapGetMissing true 10 thrpt 10 342.773 ± 2.803 ops/us
IndentityLookup.classLoaderHashCode true 2 thrpt 10 109.970 ± 12.456 ops/us
IndentityLookup.classLoaderHashCode true 10 thrpt 10 108.434 ± 14.299 ops/us
IndentityLookup.hashMapGetFirst true 2 thrpt 10 104.966 ± 7.004 ops/us
IndentityLookup.hashMapGetFirst true 10 thrpt 10 107.667 ± 0.808 ops/us
IndentityLookup.hashMapGetLast true 2 thrpt 10 108.373 ± 0.914 ops/us
IndentityLookup.hashMapGetLast true 10 thrpt 10 108.948 ± 1.086 ops/us
IndentityLookup.hashMapGetMissing true 2 thrpt 10 110.337 ± 0.735 ops/us
IndentityLookup.hashMapGetMissing true 10 thrpt 10 109.889 ± 1.061 ops/us
for example now correctly produce
which matches what we observed in the real code.
I can undraft, but now ...
because if we won't, there is no need of this collection...
For hibernate too:
given that identity hash map piggyback the same intrinsics
why we have synchronized vs ClassLoader(s) instances in Quarkus???
I don't think this is needed in Quarkus, honestly. This is for WildFly and other application servers with one CL per application.
I don't understand your comment @FroMage : My point is that the performance issue I've discovered in Quarkus happen because some of the Classloader(s) stored in the map are used as objects to perform synchronization...and it seems weird to me.
I'm saying that supporting one set-up per CL was added for Application Servers, not for Quarkus. In the case of Quarkus, we have a single ClassLoader
and a single SmallRyeContextManager
, so this entire map is not even useful for Quarkus
In the case of Quarkus, we have a single ClassLoader and a single SmallRyeContextManager, so this entire map is not even useful for Quarkus
I would say the opposite actually: look at the numbers, I can show what happen with a single one too.
This is with size = 1
:
Benchmark (disableHashCodeIntrinsics) (size) Mode Cnt Score Error Units
IndentityLookup.arrayMapGetFirst true 1 thrpt 10 1474.012 ± 19.736 ops/us
IndentityLookup.arrayMapGetLast true 1 thrpt 10 1477.383 ± 18.811 ops/us
IndentityLookup.arrayMapGetMissing true 1 thrpt 10 636.047 ± 5.564 ops/us
IndentityLookup.hashMapGetFirst true 1 thrpt 10 108.808 ± 1.781 ops/us
IndentityLookup.hashMapGetLast true 1 thrpt 10 109.350 ± 0.794 ops/us
IndentityLookup.hashMapGetMissing true 1 thrpt 10 110.651 ± 2.243 ops/us
IndentityLookup.classLoaderHashCode true 1 thrpt 10 112.808 ± 12.129 ops/us
The hashMap*
variants are ~14X at worst slower throughput (we usually don't have the missing case), and fixing it will make such lookup a no cost.
I am not questioning your finding that this is expensive. I'm saying in Quarkus we don't even need to do this expensive thing, so we should just avoid doing it.
@FroMage we do actually have multiple classloaders - but yes since they are very simple, that's in fact why I had suggested in the linked discussion that we could do something "simpler"
@FroMage we do actually have multiple classloaders
Yeah, but not in the way it's intended for Application Servers where each app has one CL and they have different config for CP because they're all different applications.
@FroMage looks at the data.. there's evidence that this is a problem, I've seen these same flames sparkling myself as well :)
Lol aren't we all just divining shamans..
I am not questioning your finding that this is expensive. I'm saying in Quarkus we don't even need to do this expensive thing, so we should just avoid doing it.
Avoiding the lookup would be even better of course. But do you have a solution?
I'm now checking if i can retrieve this information using some JDK event or by modifying the JDK myself...
Right now is not nice I cannot see MonitorExit...so maybe is not capturing what I need:
Avoiding the lookup would be even better of course. But do you have a solution?
I might, let me reopen this project after several years… Also, not what I had planned for this afternoon.
I am swapping back memory of this from tape storage, let me tell you :-/
Can you try this PR, @franz1981 ? https://github.com/quarkusio/quarkus/compare/main...FroMage:quarkus:cp-perf-manager?expand=1
Note that I'm wondering what sort of application calls these methods, but whatever.
@FroMage hooo that looks so nice.. but you could have given this hint a year ago when we spotted this problem! :dagger: :cupid:
@FroMage these are the best solutions to me :) (removing code win too) Soooo...I've sent a run in a benchmrk, will take some, and hope will compile and I let you know, but if it does what I think is doing...it's done :)
no it didn't work...https://github.com/FroMage/quarkus/actions/runs/8600808107/job/23566549448
Yeah, silly me, forgetting formatting :) Fixed now.
It seems that it has run somehow - sadly i am using a CI job for performance testing. If it was on my laptop I would have take care of doing it myself, while on the job is a PITA, because I have to update the scripts just for this run back/forth (and I don't like Jenkins, nor it likes me)
I have to double check but I can see 2 different ways to make it happens https://github.com/openjdk/jdk/blob/747582484ce89e16661ef917a89adb52f5adc2e6/src/hotspot/share/opto/library_call.cpp#L4577-L4592
Tests did not pass. I'm looking into why
It appears some code might be using it before it's set up. Which might be revealing of a problem we've always had.
Mutiny, and FT, specifically.
You mean your change @FroMage ? Because I like it so much that I cannot see anymore a reason to have this pr opened; it is just a doc why we end up in another place, no?
Yeah, my Quarkus fix. It doesn't run CI. But I should be able to fix that.
Let me close this than, so can resume any conversation elsewhere (if needed).
It replaces https://github.com/smallrye/smallrye-context-propagation/pull/399
Quoting the @Sanne comments at https://github.com/smallrye/smallrye-context-propagation/pull/399#issuecomment-1370764330
If this approach is pushing too far the assumption about being very few ...