Closed Ladicek closed 2 years ago
/cc @manovotn, @matejvasek, @mkouba, @patriot1burke
I implemented this locally and here are the results.
I measured performance impact using the JMH benchmarks from: https://github.com/mkouba/arc-benchmarks/
I measured RSS impact using this one-off tool: https://github.com/Ladicek/arc-crazybeans
All measurements were done on my otherwise-idle desktop machine (running Ubuntu 22.10 with kernel Linux 5.19.0-23-generic; hardware-wise it's Ryzen 5950X with 64 GB of RAM) with the following tuning:
# select the "performance" CPU scaling governor
sudo cpupower frequency-set -g performance
# disable hyperthreading
echo off | sudo tee /sys/devices/system/cpu/smt/control
# disable turbo boost
echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
(I know more tuning could/should be done, but I'm no expert...)
main
branch (821984a884c75775b95333a54359f3fd54702b96):
Benchmark Mode Cnt Score Error Units
InterceptorBenchmark.complex thrpt 25 7900.025 ± 30.325 ops/s
InterceptorBenchmark.simple thrpt 25 7686.898 ± 272.104 ops/s
SubclassInstantiationBenchmark.complexSubclass thrpt 25 2174.085 ± 25.565 ops/s
SubclassInstantiationBenchmark.simpleSubclass thrpt 25 9375.078 ± 538.066 ops/s
my branch (3708d490a96550231f91d26e19f1dd4a5f8a71c9):
Benchmark Mode Cnt Score Error Units
InterceptorBenchmark.complex thrpt 25 7798.061 ± 35.600 ops/s
InterceptorBenchmark.simple thrpt 25 7646.186 ± 192.110 ops/s
SubclassInstantiationBenchmark.complexSubclass thrpt 25 2134.797 ± 32.218 ops/s
SubclassInstantiationBenchmark.simpleSubclass thrpt 25 8948.386 ± 634.214 ops/s
main
branch: 106333.200 ± 683.611 kB (median 106148 kB, p99 108232 kB)
my branch: 115900.200 ± 760.293 kB (median 115772 kB, p99 117680 kB)
main
branch: 33580.440 ± 10.635 kB (median 33576 kB, p99 33624 kB)
my branch: 33727.600 ± 12.668 kB (median 33724 kB, p99 33772 kB)
main
branch: 35710712 B
my branch: 35739384 B
There's an unwritten rule in Quarkus that runtime code should not contain lambdas because they are memory-hungry. This experiment just confirms that, especially on regular JVM. Overall, the existing strategy is better and moving to lambdas makes performance worse.
Description
Currently, to implement
InvocationContext.proceed()
for interceptors, ArC generates for each intercepted method one forwarding method in the_Subclass
and one anonymous class implementingFunction<InvocationContext, Object>
. That anonymous class obtains the argument values fromInvocationContext.getParameters()
and calls the forwarding method.It might be beneficial to use lambdas instead. That would allow getting rid of the forwarding method (a lambda can directly invoke the superclass method) and an extra class (the lambda would implement
Function
itself).Implementation ideas
This requires adding support for creating lambdas to Gizmo. That's relatively straightforward when support for capturing variables is not required, which is the case here.