The PR is aimed at optimizing the way we build a tree from stack traces at read.
Specifically, I replaced the model.Tree with model.StacktaceTree that we've optimized for pprof symbolication.
Here is a comparison of queries that build a flame graph that includes 3M unique stack traces, and 10M nodes (real-case dataset) in a singe thread:
Before 9615096375 ns/op 4244433576 B/op 26203002 allocs/op
After 3626939417 ns/op 1399753624 B/op 497736 allocs/op
The improvement is appx. 60% (9.6s -> 3.8s). In the test, I used the default 16k max nodes limit. However, even without the limit (tested on ~1-10M node trees), the performance is sustainable:
Further optimization of tree symbolication seems to be complicated (although, there is an opportunity to decrease cache misses and improve branch prediction). I'd rather consider optimizing stack trace selection: instead of resolving each stack trace, it makes sense to trim insignificant ones based on their weight (various heuristics could be employed), if the stack trace set exceeds a threshold (e.g., cardinality > 1-2M). This will introduce some inaccuracy in the flame graphs, but I believe this is acceptable:
The PR is aimed at optimizing the way we build a tree from stack traces at read.
Specifically, I replaced the
model.Tree
withmodel.StacktaceTree
that we've optimized for pprof symbolication.Here is a comparison of queries that build a flame graph that includes 3M unique stack traces, and 10M nodes (real-case dataset) in a singe thread:
The improvement is appx. 60% (9.6s -> 3.8s). In the test, I used the default 16k max nodes limit. However, even without the limit (tested on ~1-10M node trees), the performance is sustainable:
Further optimization of tree symbolication seems to be complicated (although, there is an opportunity to decrease cache misses and improve branch prediction). I'd rather consider optimizing stack trace selection: instead of resolving each stack trace, it makes sense to trim insignificant ones based on their weight (various heuristics could be employed), if the stack trace set exceeds a threshold (e.g., cardinality > 1-2M). This will introduce some inaccuracy in the flame graphs, but I believe this is acceptable: