Closed kvcache closed 3 years ago
moving this over to the opentelemetry-java project
Hi @kvcache . The purpose of the Bound*
instruments is precisely to not allow record-time attributes. That code path can be highly optimized and essentially "zero allocation" if we don't allow additional attributes to be passed in. If you need to pass in attributes on the hot path, you'll need to use the regular instruments, and pass the full attribute set in with each recording.
I think maybe that's missing the point? When I implemented literally this a few years ago I expressed it as allowing a zero-allocation attribute set - rather than exclusively forcing it. It's an arbitrary requirement, not driven by technical necessity, and ought to be called out in docs but allowed. Sometimes you need to do both.
Maybe the right compromise here would be to have a means to tell an Instrument or a Meter
to always add the Resource attributes on the attributes' way out for export. That was the zero-allocation compromise I made & it was never found to be costly or restrictive. If you are consuming Resource Attributes you know they won't differ and you don't need to consider them in aggregation uniqueness keying; just toss them in the output body on their way out the door.
Consider this silly scope timer, which cannot disambiguate by any dynamic tags, but at least a user doesn't have to pass in Resource attributes at every single report site:
class Timer(meter: Meter, name: String, description: String, labels: List<Label> = listOf()) {
val latencyRecorder: BoundLongHistogram = bindResourceLabels(
meter.histogramBuilder(name)
.ofLongs()
.setDescription(description)
.setUnit(StandardUnit.Microseconds.toString())
.build(),
labels
)
private fun bindResourceLabels(histogram: LongHistogram, bonusLabels: List<Label>): BoundLongHistogram {
val labelsBuilder = Attributes.builder().withResourceLabels()
bonusLabels.forEach { (name, value) ->
labelsBuilder.put(name, value)
}
return histogram.bind(labelsBuilder.build())
}
// Records the time spent in the block
inline fun <T> record(block: () -> T): T {
val start = System.nanoTime()
try {
return block()
} finally {
val end = System.nanoTime()
val latency = kotlin.math.max(0, (end - start) / 1000)
latencyRecorder.record(latency)
}
}
}
Now, if the Meter could be set to report its Resource Attributes, this would no longer need to be bound and I could pass in optional Timer Attributes to make this way more ergonomic. Also that would be a performance boost as Resource Attributes can be considered immutable and invariant, whereas the bound attribute set is not invariant across instances. (as a kludge I make multiple bound timers with the same name and literal same description with a different value for a single "operation" tag to time the same thing by different facets)
@jsuereth I wonder if we should remove bound instruments altogether at this point, since they're not specced that I could see in the metrics API spec.
We could just remove them. Bound instruments are designed (right now) to avoid a single lock within the metrics SDK and can improve performance of writing metrics by 2x. What you're suggesting is to remove that performance benefit.
Specifically bound instruments ONLY exist as a performance optimisation when attributes are known at allocation time. I can dig out the benchmarking code if you'd like to see the boost.
This feature would basically negate any benefit of bound instruments, instead you should just allocate Attributes and then make append a quick thing (if baseAttributes.toBuilder().put(...).build()
isn't elegant enough).
Dropping Bound instruments and just building a whole new Attributes map for every single measurement. Terrible for performance and GC overhead (measured).
What code where you using here? How much of this is lack of efficient attribute-append operation?
I can give you an example but not from my actual service:
// I have to make a timer per api because I can't both bind Resource attributes and provide dynamic attributes.
val listFriendsTimer = Timer(
myEnvironmentMeterWithResourceAttributesThatNeedToBeInMyAttributeSet,
"request",
"an api request. could be any request",
listOf(Label("api"), "list_friends")
)
val describeFriendTimer = Timer(
myEnvironmentMeterWithResourceAttributesThatNeedToBeInMyAttributeSet,
"request",
"an api request. could be any request",
listOf(Label("api"), "describe_friend")
)
val addFriendTimer = Timer(
myEnvironmentMeterWithResourceAttributesThatNeedToBeInMyAttributeSet,
"request",
"an api request. could be any request",
listOf(Label("api"), "add_friend")
)
// I have to make sure I use the separate, correct timer instance for each api instead of instantiating inline with the
// obviously correct attributes. (have to scroll up and down to see if I did it right)
fun listFriends(selfUserId: Integer) = listFriendsTimer.record {
// I can't record the user ID as a timer attribute because I had to bind the instrument
// to get Resource attributes.
}
fun describeFriend(selfUserId: Integer, friendUserId: Integer) = describeFriendTimer.record {
// I can't record the user ID of the requester or the target user as a timer attribute because I
// had to bind the instrument to get Resource attributes.
}
fun addFriend(selfUserId: Integer, friendUserId: Integer) = describeFriendTimer.record {
// I pasted this declaration and fell victim to the trivial footgun of having the declaration and the use
// of the timers separated.
}
An efficient append attribute operation would be helpful and would make recording userId
attributes much more tractable. That's 1/2 of it. I also need to report those static, unchanging Resource Attributes.
static-unchanging-resource attributes should be done as part of Resource
in otel, not passed in with measurements.
Your example here looks like it's not the OTEL API but some other API. Did you check the performance of the OTEL library?
Additionally, it looks like your instantiating a list of your labels in every call. If you pre-allocate the label list, would that be more efficient?
In otel our API takes Attributes
not a list of key-value pairs, so I'm not sure we have the same performance issue.
static-unchanging-resource attributes should be done as part of Resource in otel, not passed in with measurements.
Yeah, I agree - however, the static-unchanging-resource
attributes do not get attached to metrics in prometheus export. That means with Otel I had no way to disambiguate metrics from one server versus another on Grafana until I made this kludge.
Your example here looks like it's not the OTEL API but some other API.
Scroll up a bit, it's a class I wrote using the OTEL API to make the OTEL API more ergonomic.
Did you check the performance of the OTEL library?
It has not presented a problem yet. The profiler showed its overhead was acceptable.
Additionally, it looks like your instantiating a list of your labels in every call.
Nope, it's preallocated! I would like to be able to provide both preallocated-static, and dynamic labels.
In otel our API takes Attributes not a list of key-value pairs
Oh I see what you're saying, yeah - I just omitted the AttributesBuilder dance for sake of brevity. There are a bunch of builders and unrelated stuff that distracts from the main issue here: I'm looking to have both static-unchanging-resource
attributes and dynamic
attributes on Instrument
s' exported values.
the static-unchanging-resource attributes do not get attached to metrics in prometheus export
You can do this if you leverage the Otel collector (e.g. this configuration parameter available on many exporters) to export to prometheus and I think that's a great feature request for the prometheus exporter.
SPECIFICALLY, it'd be great if you raised a feature request around this use case (and I can link to that in an OTEL Specification feature request). I'd love to improve the interaction of OTEL Resource + Prometheus Export as I think we can do far better than now, and your use case exactly matches what resource is intended for. E.g. see this related issue: https://github.com/open-telemetry/opentelemetry-specification/issues/1782
Scroll up a bit, it's a class I wrote using the OTEL API to make the OTEL API more ergonomic.
I see. I think we're likely to provide a timer instrument in the future. Two main points:
Oh I see what you're saying, yeah - I just omitted the AttributesBuilder dance for sake of brevity. There are a bunch of builders and unrelated stuff that distracts from the main issue here: I'm looking to have both static-unchanging-resource attributes and dynamic attributes on Instruments' exported values.
That's not the purpose of the bound API, unfortunately. It's just there for when you know your attributes ahead of time and we can avoid a lock. It also degrades performance when you're leveraging Baggage
to attach labels to metrics from upstream distributed context. The implementation literally does not allow attributes to be added because you're locked into a specific timeseries writer when you use bound (and you've guaranteed that memory has been allocated + frozen for that set of attributes). I.e. "bound" is how you can ensure 0 memory allocations in a hot-path server that collects metrics, and our benchmarks show this. We literally cannot add dynamic attributes without sacrificing that guarantee.
LIke @jkwatson It might be we just drop it to lead to less confusion. I still think it's useful for internal instrumentation to provide the "zero hot-path allocation" guarantees.
I.e. "bound" is how you can ensure 0 memory allocations in a hot-path server that collects metrics, and our benchmarks show this. We literally cannot add dynamic attributes without sacrificing that guarantee.
I missed that this was a feature of bound instruments - please do not remove these unless you have an equivalent feature elsewhere in your api, and thank you for educating me on this semantic! I will keep the Timer wrapper then anyway to make the attribute binding more clean. Please know that I am currently depending on bound Instruments and if they are contributing to my observed (very) low garbage collector activity then I will sorely miss their removal. Tbh I will go as far as to fork the repo to keep the feature 😬 but would prefer not.
Powerful tools are worth confusion at first glance; let's keep bound instruments in the API if they're the performant option (which I now understand they are). I implore you: Keep bound instruments!
resource_to_telemetry_conversion
The PrometheusExporter supports resource_to_telemetry_conversion
, but the prometheusremotewriteexporter does not :-(
I'm currently using Promscale to bridge OpenTelemetry to Timescaledb and have to use the remote write exporter, or else add yet another time series db into my infra as a band aid 😄. resource_to_telemetry_conversion
is exactly what I'd like but really it seems like such a fundamental thing that it should be standalone, maybe in a processor, so that I don't have to feature request some person to add it out of the goodness of their heart in the fullness of time!
A Timescaledb exporter would be amazing and (if it supported resource_to_telemetry_conversion
) would alleviate the Resource binding weirdness I'm currently stuck doing.
@jsuereth Do you think you can push a PR to the spec for bound instruments? I'm a bit worried about going with them in one language because they do seem to correlate back to semantic conventions. In particular, I think in most if not all cases, a bound attribute could be encoded into the metric name. We already see this, e.g. http_client_duration and http_server_duration, where client/server could otherwise be a bound attribute. free_mem_heap, free_mem_direct also come to mind as similar. I think it would be unfortunate if we have a fairly large API that may not actually be reflected into semantic conventions (all potential bound attributes end up in the metric name instead) and think we should avoid it. We could consider removing now and adding later if we don't want to go through the spec process for now but think it's important to do.
The piece of the specification we had added for bound instruments is this line:
the client MAY allow attribute values to be passed in using a more efficient way (e.g. strong typed struct allocated on the callstack, tuple).
I think it'd be hard to specify the EXACT semantics of bound instruments in the specification across all languages, but I'm happy to do some kind of "For Java, here are guarantees of using bound
that we expect all SDK implementations to uphold".
The issue you mention in semantic conventions of metric name vs. attribute is one I actually think goes beyond the notion of "bound", and needs to be addressed irrespective of it. Have you opened a ticket on semantic conventions about whether http_client_duration + http_server_duration should just be http_duration with a set of attributes?
I think bound attributes are semantically ones that are known at instrument creation time vs record time. This seems language agnostic and the concept can be added to the spec. Languages could potentially skip any particular implementation if they want since it's always possible to merge at record time. But the concept being introduced would allow use from semantic conventions.
Have you opened a ticket on semantic conventions
I haven't opened an issue since they both seem ok. I think I've used a pattern like what we already have in apps before using Prometheus and didn't have a problem. I can imagine using attribute without a problem too.
Indeed it's convention specific whether it's appropriate to use an attribute vs the name usability wise. But without the concept of bound attributes in the spec, I can imagine actively avoiding them for performance reasons even if it is appropriate usability wise. So I think they are related at least somewhat.
@anuraaga now that the dust is settling in metric reader, I'll see if I can expand on the performance statement in the specification.
I still think @jkwatson question of whether we're using confusing terminology applies. I plan to write up a "how to avoid hot path allocation with otel metrics" markdown file and we can determine whether or not the API we expose is the right one. Given expectations of users, "bind" may be seen as pure convenience vs. performance all the time. I'd like to avoid that confusion
Great, thanks - avoiding allocation is important for us and having that as a concept that OpenTelemetry promotes as an option is incredibly valuable. Hardening it with docs & expectation around "convenience" versus "performance" would help.
Since I (on my current project) must choose obsessively low-cost instrumentation over convenience, I will choose performance-oriented paths without regard for the convenience & setting up that expectation would (I think) have headed off my opening this ticket in the first place. Btw in case it's unclear, I don't want what I asked for anymore because I am convinced that this is the performant pattern and that doing what I asked would erode the low cost that bound instruments promote. Thank y'all again for your education on this ❤️
I wonder if it would be useful to convert this to a discussion, @kvcache so others can find it and hopefully understand the nuance a bit better.
@jkwatson agree - I lack the permissions but any maintainer should feel free to do so from my perspective.
Is your feature request related to a problem? Please describe. Not all exporters support adding resource Attributes to outgoing Metrics. To get around that in my system I'm eagerly binding all of my Counters and whatnot to the Resource's Attributes. But the
BoundLongCounter
api (and friends) does not have the overload likevoid add(long value, Attributes attributes);
to allow me to optionally attach some observation-specific Attributes.Describe the solution you'd like I would like the
BoundLongCounter
interface to extendLongCounter
. Or at a minimum add the Attributes overloads fromLongCounter
.Perhaps obvious, but this issue is not limited to
BoundLongCounter
; it is shared by everyBound**
Instrument and the ask extends to all of them. Binding Attributes should not preclude the possibility of having dynamic Attributes (which I do).Describe alternatives you've considered