Closed simonswine closed 3 weeks ago
Found a good sample – CloudWatch exporter. AWS Smithy Go code has always been very "deep" because of the countless middlewares, callbacks, hooks, etc.
flamegraph_2024-06-07_1220-to-2024-06-07_1320.pb.gz
RE: connecting / overlapping frames – the easiest way to examine truncated stack traces I found is to switch to the sandwich view, where the "ladder" at the top indicates presence of the issue (this is mimir querier in the screenshot, btw)
There's one thing that may require an adjustment: a very conservative recursion check that may cause a situation where part of the profiles are not repaired, which makes the whole idea somewhat ineffective
I suggest that we double it regardless of the profile/sample type to relax the restriction.
If it doesn't help and CPU stack traces are still not repaired, we may need to play with constants – token size, suffix length, overlap size, etc.
RE: connecting / overlapping frames – the easiest way to examine truncated stack traces I found is to switch to the ? sandwich view, where the "ladder" at the top indicates presence of the issue (this is mimir querier in the screenshot, btw)
My strategy is using pprof's new sandwich view, it just takes a bit more than doing it in Pyroscope itself.
There's one thing that may require an adjustment: a very conservative recursion check that may cause a situation where part of the profiles are not repaired, which makes the whole idea somewhat ineffective
The first time I see a difference in cloud-watch-exporter's profiles is if I am using values fairly high 56 or higher.
The first time I see a difference in cloud-watch-exporter's profiles is if I am using values fairly high 56 or higher.
Yeah, this is more like a safety check. I also haven't seen too many examples. I'm suggesting increasing it because I see that the repair does not happen in some cases where it should have worked; and this limit is the only meaningful explanation I can find. Actually, I added this limit after testing it in one of our internal deployments, due to the observed CPU burn
This extends the truncation fixing of heap profiles to also cover CPU profiles.
I am also adding an mimir-querier profile, but that seems to be not fixable, as it is doens't contain the connecting stacks.