Open emmanuelbernard opened 4 years ago
CC @cescoffier @stuartwdouglas and @FroMage AFAIK Clement had some ideas and work in progress.
My only idea in this area was to replace the N thread-locals used by every framework with a single thread-local provided by MP-CP which had room for storage for each framework.
But that requires each framework to allow external storage of their thread-local, and perhaps won't be the biggest slowdown, which seems to be due to nesting of stages that each apply context that we pile up on the stack.
Those cases are probably better served by trampolines…
My approach so far is around fusing between staged that can be fused. It reduces the amount of Uni/Multi creation and subscription (which as a consequence reduce the number of interceptions). I’ve done some tests, and it seems to work. However it’s going to take quite some time before being there. It will also be Mutiny centric, so CS and others API won’t see these benefits.
I have an idea around context gates, which basically replaces multiple thread locals with a single cached TL access in most cases.
I need to experiment to make sure it will work but the basic idea would be that anywhere that a context might change (e.g. transaction interceptor etc) we add a gate that clears a ThreadLocal.
When capturing a context for the first time the result is stored in this TL, subsequent captures just use the value of this TL. When restoring the captured data is compared via identity comparison with the current value of the TL, if it is the same then there is no need to restore.
This should work in theory, and provide significant gains, as long as we can effectively provide these context change gates.
@stuartwdouglas @FroMage Is this work mostly complete? Do you think it will make 1.8, or should we split it into a 1.9 followup task?
There is still work to do. The fusing approach directly in Mutiny didn't provide the expected output, and the performance gains are meaningless.
@FroMage should we assign this to you?
I believe it has been done. At least, the mutiny part is done.
right, Mutiny is very efficient in this regard now. But @FroMage had some further ideas for Smallrye CP... maybe he prefers tracking that in a new issue.
Overall cost for CP is still very dominant when profiling, so I suppose we could keep going with this one - but have no preference myself.
Assign it to me, but I don't have free cycles ATM. I still need to make a move on the storage branches. But lately Gavin had the idea of stuffing all the thread-locals on the Vert.x context, though this is probably not going to be fast. At least it would simplify some things. So further experiment required :(
Context Propagation is used more and more but its cost is noticeable. We need to improve it in the short to medium term.