Optimize `Scannable#name()` and related logic

Stephan202 commented 1 month ago

Performance is improved in two ways:

By invoking Scannable#stepName() only when Attr.NAME is not explicitly set.
By optimizing Traces#extractOperatorAssemblyInformationParts, to which several Scannable#stepName() implementations delegate.

The Scannable#name() logic may be executed many times, e.g. if a hot code path uses {Mono,Flux}#log or Micrometer instrumentation. The added benchmark shows that for large stack traces, the new Traces implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization.

Deferral of invocation of Scannable#stepName() assumes that said method does not have side-effects. This is true for all built-in implementations.

While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.

chemicL commented 1 month ago

Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the stepName lazy evaluation when we do release notes.

Stephan202 commented 1 month ago

Thanks for the feedback @chemicL! I cherry-picked the commit from #3900 into this branch and rewrote the PR title and summary.

Stephan202 commented 1 month ago

@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines. (In my defense: the unit tests of the modified code also seem to indicate that more lines may be expected.)

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

Stephan202 commented 1 month ago

as in practice the input stacktrace appears to always be generated by a CallSiteSupplierFactory implementation, both of which output at most two lines.

There's one other code path: CallSiteInfoAddingMethodVisitor passes a manually constructed two-line stack trace to Hooks#addCallSiteInfo.

So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend.

I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have CallSiteInfoAddingMethodVisitor pass the two constructed stack frames separately. Doing this requires adding (or modifying) a public Hooks method, which causes :reactor-core:japicmp to report an API compatibility failure. Is that acceptable, and if so, how can I make that change without failing the build?

chemicL commented 1 day ago

Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the Hooks methods that are marked as deprecated with a huge warning they're for internal use. You only need to find japicmp configuration in build.gradle and add an entry to the methodExcludes = [] array.

Stephan202 commented 1 day ago

Thanks for the ping on this PR @chemicL; I meant to report back here. :shame:

I did try the alternative approach mentioned (including the Hooks customization), but testing it against an internal benchmark (one that contains some Picnic-specific code, but mostly causes Reactor logic to be executed, with Reactor Debug Agent enabled), I consistently found the alternative approach be perform slightly worse. I lost quite some time over that, as it really defied (and still defies) my intuitions.

I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR.

Stephan202 commented 1 day ago

Rebased branch; applied 100% cleanly.

Stephan202 commented 1 day ago

The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic.

reactor / reactor-core

Optimize `Scannable#name()` and related logic #3901