Closed MrAlias closed 11 months ago
Requiring the Offer method receive all attributes when only the dropped will be preserved is inefficient if the implementation already knows the dropped attributes at the time Offer is called.
I was actually looking closely at the java implementation of exemplars just a few weeks ago, and starting a path down refactoring to do exactly what you're describing. However, after looking closely at the implementation, I decided that for the java implementation, the existing behavior was in fact optimal:
For java, computing the diff of the full original set and the filtered set requires additional computation / allocation for each measurement passed to the exemplar filter, and its done on the hot path. In contrast, by doing it at collect time does the computation / allocation once for each recorded exemplar (as opposed to once for each measurement), and does it off the hot path. Not sure how the go implementation of attributes works, but if you treat them as an immutable set of key value pairs you might see the same thing we see in the java implementation.
Attribute implementations do differ though so I would support this change to offer implementations flexibility. Perhaps we can pass both the original attributes and the diff set, just in case there is any reason why a filter might want to base some behavior off the full original set.
if you treat them as an immutable set of key value pair
Yeah, this is the difference. The passed attributes are ordered in place with the same backing memory, the dropped attributes are placed at the end. Because of this, we are able to just pass reference to those dropped attributes without any computational or memory overhead.
Perhaps we can pass both the original attributes and the diff set, just in case there is any reason why a filter might want to base some behavior off the full original set.
Given Go slices are just references to the same underlying data this should work for us. It might be a bit of a messy API though.
This is an important bit of discussion to have here. Here's a few points rattling in my brain.
offer
is responsible for sampling measurements. As such, it needs to make a decision which exemplars to keep.With only those facts, I can understand wanting to limit things, however:
I'm not positive we've "finished" the exemplar specification enough to lock down attributes at this point. Specifically I planned to open the reservoir interface to allow users to provide their own implementation/sampling and I think we'll want all attributes available on offer
for that.
- We could provide a reservoir sampler that maximizes entropy by looking at diverse attribute sets.
- We could provide a sampler that ignores measurements with a given attribute.
These are good examples of use where the complete set it needed, thanks for including them!
I still think the exemplar reservoir can be structured to handle both of these cases and only receive the dropped attributes on Offer
. As mentioned, since the exemplar reservoir is used per-aggregation per-attribute-set, this means the implementation, and any user defined, exemplar creation function needs to take the following form:
type ExemplarFunc func() ExemplarReservoir
That way when the SDK determines a new aggreation/attribute-set pair it can create a reservoir for that group of measurements.
This also means that given these exemplar reservoirs are created when a new aggregation/attribute-set combination is determined the retained attributes will already be known (assuming filtering on input). This means that the implementation, and any user defined, exemplar creation function could just as easily take this form:
type ExemplarFunc func(retained attribute.KeyValue) ExemplarReservoir
Whenever a measurement is made for the aggregation the retained attributes will be computed, and as mentioned above the dropped can be a natural byproduct. The aggregation will use the retained attributes to determine what value it needs to aggregate with its state and it can simply Offer
the reservoir of that state the dropped attributes:
reservoir.Offer(ctx, value, droppedAttr)
That reservoir
will know the complete set of attributes at this point because it knows the non-dropped attributes that were provided on creation. This means it can then handle the more complex filtering operations mentioned.
@MrAlias That looks promising too. I think the question now is can we craft the specification so you don't need to include all attributes in Offer as long as the reservoir can reconstitute them somehow.
This would allow the existing Java implementation to remain as-is while allowing Go to optimise as it desires and keep the future use cases possible.
Currently the exemplar reservoir
Offer
method is defined as follows in the specification:However, the reservoirs are only expected to produce exemplars that hold the attributes that have been dropped from a measurement:
Requiring the
Offer
method receive all attributes when only the dropped will be preserved is inefficient if the implementation already knows the dropped attributes at the timeOffer
is called. It means the exemplar implementation will need to independently determine the dropped attributes.Background
The Go SDK filters measured attributes at the time they are recorded. This ensures that attribute filters can be utilized to restrict attributes for instrumentation with cardinality explosion and have it safely prevent run-away memory use.
In the process of filtering these measured attributes the calculation of the dropped attributes is trivial and also known. Based on the current specification, however, the Go implementation cannot simply provide these dropped attributes to the exemplar reservoir
Offer
method.Proposal
Update the specification to state that the attributes passed to
Offer
need to be either the dropped attributes or the complete set based on the implementation (but not both).What about filtering with the kept attributes?
If the reservoir does not receive the complete set of attributes that are being recorded during a measurement it cannot filter based on these attributes. Therefore, this would reduce the functionality of the reservoir, right?
There are two points to be made here:
The second point is possible because all exemplar reservoir act as a reservoir for a distinct set of attributes. Given this one-to-one relationship when an exemplar reservoir is create it could be provided the set of attributes that will be retained and for each measurement it will be provided the set of attributes dropped. Therefore, it will have have access to the complete set of attributes.
If the functionality to filter on the complete set of attributes needs to be retained by the reservoir, an implementation can still provide this while still optimizing including the measurement process optimization of only receiving the dropped attributes.