Open Stephan202 opened 1 month ago
Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the stepName
lazy evaluation when we do release notes.
Thanks for the feedback @chemicL! I cherry-picked the commit from #3900 into this branch and rewrote the PR title and summary.
@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory
implementation, both of which output at most two lines. (In my defense: the unit tests of the modified code also seem to indicate that more lines may be expected.)
So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.
as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines.
There's one other code path: CallSiteInfoAddingMethodVisitor
passes a manually constructed two-line stack trace to Hooks#addCallSiteInfo
.
So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.
I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have CallSiteInfoAddingMethodVisitor
pass the two constructed stack frames separately. Doing this requires adding (or modifying) a public Hooks
method, which causes :reactor-core:japicmp
to report an API compatibility failure. Is that acceptable, and if so, how can I make that change without failing the build?
Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the Hooks
methods that are marked as deprecated with a huge warning they're for internal use. You only need to find japicmp
configuration in build.gradle and add an entry to the methodExcludes = []
array.
Thanks for the ping on this PR @chemicL; I meant to report back here. :shame:
I did try the alternative approach mentioned (including the Hooks
customization), but testing it against an internal benchmark (one that contains some Picnic-specific code, but mostly causes Reactor logic to be executed, with Reactor Debug Agent enabled), I consistently found the alternative approach be perform slightly worse. I lost quite some time over that, as it really defied (and still defies) my intuitions.
I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR.
Rebased branch; applied 100% cleanly.
The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic.
Performance is improved in two ways:
Scannable#stepName()
only whenAttr.NAME
is not explicitly set.Traces#extractOperatorAssemblyInformationParts
, to which severalScannable#stepName()
implementations delegate.The
Scannable#name()
logic may be executed many times, e.g. if a hot code path uses{Mono,Flux}#log
or Micrometer instrumentation. The added benchmark shows that for large stack traces, the newTraces
implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization.Deferral of invocation of
Scannable#stepName()
assumes that said method does not have side-effects. This is true for all built-in implementations.While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.