Closed codefromthecrypt closed 5 months ago
In implementing this, let's not change the class name being used to do async work, as it causes drift in @openzipkin/armeria https://github.com/line/armeria/blob/4ca7e898201dd929970e1210899dd94e08023df2/brave/brave6/src/main/java/com/linecorp/armeria/common/brave/BraveBlockHoundIntegration.java#L28
Feature
When reporter was first created, it had a rare feature that allows you to limit both the count (
queuedMaxSpans
) and total size of spans (queuedMaxBytes
) in the backlog. This isn't perfectly correct with regards to memory size, but can be adjusted to limit memory, nevertheless.Some sites will not be terribly concerned about the max pending bytes and by avoiding that can save time in reporting.
Let's allow a setting
queuedMaxBytes=0
to disable guards inAsyncReporter
andAsyncSpanHandler
. When this is in-place, aByteBoundedQueue
is no longer used.Instead, a normal bounded queue is, that drops when the count is higher than
queuedMaxSpans
. All other work is done on the AsyncSpanReporter thread, including dropping if the size of a queued span is larger thanmessageMaxBytes
Rationale
This should solve https://github.com/openzipkin/brave/issues/1426 by reducing synchronized time when
end
is called in brave to negligible.Now, a large span will take microsecond to size (in json or proto). This size function was very carefully done to be performant.
More time would be in edge cases, or if someone is reporting 100pct (despite that being very much unadvised), or otherwise creating a large amount of data at a high rate.
The
spanhandler.end
function is under a synchronized lock in brave, someone using virtual threads and-Djdk.tracePinnedThreads=full
may see a warning. It is unknown what amount of time triggers this.Even if when this log has no affect on real-world performance, the topic is distracting and costs maintainer time to answer. Effort spent to minimize this kind of time, is also worth it.
Solely looking at a count backlog isn't as useful in memory limiting, but it is very cheap (faster to check). Also, I don't think any other reporter does memory bounds anyway, so it doesn't limit competitiveness of the reporter library.
The main thing is turning off size guards results in far less time under a lock, hopefully so minuscule pinned thread watchers can't detect it even under load.
Example Scenario
Mainly, this is all about not distracting those people experimenting or using VirtualThreads. Brave has synchronized code for a good reason, and it isn't harmful, except possibly due to work done when using async reporting here.
Related