quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.6k stars 2.63k forks source link

Improve context propagation performance #9269

Open emmanuelbernard opened 4 years ago

emmanuelbernard commented 4 years ago

Context Propagation is used more and more but its cost is noticeable. We need to improve it in the short to medium term.

emmanuelbernard commented 4 years ago

CC @cescoffier @stuartwdouglas and @FroMage AFAIK Clement had some ideas and work in progress.

FroMage commented 4 years ago

My only idea in this area was to replace the N thread-locals used by every framework with a single thread-local provided by MP-CP which had room for storage for each framework.

But that requires each framework to allow external storage of their thread-local, and perhaps won't be the biggest slowdown, which seems to be due to nesting of stages that each apply context that we pile up on the stack.

Those cases are probably better served by trampolines…

cescoffier commented 4 years ago

My approach so far is around fusing between staged that can be fused. It reduces the amount of Uni/Multi creation and subscription (which as a consequence reduce the number of interceptions). I’ve done some tests, and it seems to work. However it’s going to take quite some time before being there. It will also be Mutiny centric, so CS and others API won’t see these benefits.

stuartwdouglas commented 4 years ago

I have an idea around context gates, which basically replaces multiple thread locals with a single cached TL access in most cases.

I need to experiment to make sure it will work but the basic idea would be that anywhere that a context might change (e.g. transaction interceptor etc) we add a gate that clears a ThreadLocal.

When capturing a context for the first time the result is stored in this TL, subsequent captures just use the value of this TL. When restoring the captured data is compared via identity comparison with the current value of the TL, if it is the same then there is no need to restore.

This should work in theory, and provide significant gains, as long as we can effectively provide these context change gates.

n1hility commented 4 years ago

@stuartwdouglas @FroMage Is this work mostly complete? Do you think it will make 1.8, or should we split it into a 1.9 followup task?

cescoffier commented 4 years ago

There is still work to do. The fusing approach directly in Mutiny didn't provide the expected output, and the performance gains are meaningless.

Sanne commented 3 years ago

@FroMage should we assign this to you?

cescoffier commented 3 years ago

I believe it has been done. At least, the mutiny part is done.

Sanne commented 3 years ago

right, Mutiny is very efficient in this regard now. But @FroMage had some further ideas for Smallrye CP... maybe he prefers tracking that in a new issue.

Overall cost for CP is still very dominant when profiling, so I suppose we could keep going with this one - but have no preference myself.

FroMage commented 3 years ago

Assign it to me, but I don't have free cycles ATM. I still need to make a move on the storage branches. But lately Gavin had the idea of stuffing all the thread-locals on the Vert.x context, though this is probably not going to be fast. At least it would simplify some things. So further experiment required :(