Closed cijothomas closed 4 months ago
- The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
I like this design. However, there might be a bug here. Let's say we add the first measurement of A
with attributes of (c = 3, b=2, a=1)
, then another one B
with attributes of (c=3, a=1, b=2)
. Based on the description above, the second lookup will fail. And then we sort the attributes of B
and get (a=1,b=2,c=3)
. Note that it's not the same as the attributes of A
, and we have to create a new metric here, which is obviously not what we want.
- The current design requires Sorting based on keys. Based on benchmarking, this is the biggest contributor to the hot path. One potential way to improve this is to add 2 entries to the dictionary - one with the 1st seen order of keys, and another with sorted order. If subsequent recordings are with same order as initial, it can be looked up without sort. If lookup fails, then need to sort and do another lookup. This might be good improvement, if user keeps providing the keys in same order.
I like this design. However, there might be a bug here. Let's say we add the first measurement of
A
with attributes of(c = 3, b=2, a=1)
, then another oneB
with attributes of(c=3, a=1, b=2)
. Based on the description above, the second lookup will fail. And then we sort the attributes ofB
and get(a=1,b=2,c=3)
. Note that it's not the same as the attributes ofA
, and we have to create a new metric here, which is obviously not what we want.
Not sure I fully understood the bug. When (c = 3, b=2, a=1)
is encountered, its lookup will fail. Then its sorted an another lookup is done, which also fails. Then both original order and sorted order is inserted to dictionary. Next, (c=3, a=1, b=2)
comes. Its initial lookup fails. Then sorted lookup occurs, which succeeds.
(btw, this is not implemented yet)
... Then both original order and sorted order is inserted to dictionary...
Now I see. Thanks for your explanation. I thought only the original order is inserted before.
@utpilla is looking at implementing the item 1 to avoid sorting costs.
The most critical improvements are already shipped as part of 1.2 itself. Removing this from 1.3 milestone, as it can occur anytime. (1.3 is releasing in May 2022 with other improvements)
@cijothomas Do you think there's anything left to do here, or could this be closed now?
Nothing very specific is remaining so okay to close. (There is still some improvements for the aggregation logic for better throughput, but I'll open separate issue with specifics.)
Opening an issue to track future improvements.
The current AggStore uses 2 data structures to find the
MetricPoint
. One is an pre-allocated array ofMetricPoint
, where the actual points are stored, and then a concurrentdictionary (2 level), to find the index within the array for a given keys/values combinations.Given this is purely internal implementation without any public API, this can be addressed after other high priority asks.