iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.38k stars 3.86k forks source link

Contributing a more advanced memory analyzer tool #3331

Open JubilantJerry opened 3 years ago

JubilantJerry commented 3 years ago

Hello BCC community!

I work on the development of a large complex application, and at points I've needed to analyze my application to make memory-related bug fixes or optimizations. I've tried to use a variant of the BCC memleak tool, but I found that it wasn't useful from a dev-oriented perspective where I'm trying to understand where memory allocations come from when performing a specific task in the application. Long story short, I've eventually developed a more in-depth memory analysis tool that helped greatly with my development work, and was wondering if the community had any interest in me contributing my tool to this repository?

The reason why I found a need to make a different tool was as follows. My understanding was the memleak tool is best used for testing if a memory leak exists at all on a continuously running long-lived application, and that it also can be used to save some basic metrics about when and where the memory was leaked. In such a scenario, the exact timing of when the memory stats are tracked / reported are not too critical, and the tool can focus on just the largest individual allocations.

When I try to study a specific action in an application, I end up having to use a stopwatch or a clock to time the periodic reporting so that I can get information about the action I initiate. If I'm interested in reducing the peak memory usage for an application that has no leaks, I needed the reporting to happen at some point in the middle of the workload. My actions always felt like a race against time because of the periodic dumping style user interface.

If there are many small allocated chunks from many stacks, the tool takes a long time to aggregate the results unless I enable eBPF-level stats aggregation (through the misleading --combined-only option), and when I do use this option I find that the tool gives inaccurate results for a multi-threaded application, causing me to hunt down phantom memory stacks. The "top-N allocations summary" output format also hasn't been very helpful in judging what functions in the application are creating the biggest memory impact, since a function with many small allocations is ranked very low even if the majority of memory utilization comes from that function, and also because this format emphasizes low-level functions near the top of the call stack rather than higher-level functions closer to the bottom. In other words, a lot of information is lost in the output format even if I ask for the top-1000 stacks.

The version of the tool I made uses a terminal interface to let the user manually start and end profiling (something like the interface of gdb). It produces a top-down summary of allocations found during the profiling period, in a format similar to perf report. It also supports outputting to a format compatible with flamegraph. It uses the faster eBPF-level aggregation without the inaccuracies from the memleak tool, and can detect if the maps were saturated (which causes the stats to be corrupted). My main concern is that the existing tools all seem to use the periodic dumping style and top-N style output formats, so this one would have a different style. Would it still fit in as a contribution to the community?

brendangregg commented 3 years ago

Some points:

JubilantJerry commented 3 years ago

Thank you for the response!