Open JubilantJerry opened 3 years ago
Some points:
Thank you for the response!
memleak
tool.trace.py
but not quite the size of an individual repo. I understand where you're coming from if your repo has a philosophy of simplicity and want all tools to be less than 1k lines though.-p pid
, run the application to be analyzed, then type dump
into the BCC tool interface after finishing the desired actions in the application. The printed output is a top-down view of the remaining memory allocations as one might expect.malloc
even though this type of context switch is relatively cheap overall. This makes memleak
risky as a non-invasive probing solution on a production system, but for a dev / debugging use case my opinion is that the reduced performance is not as big of a problem, since the tool will be used on a debug-only deployment. BCC is still very useful due to the flexibility, since applications don't need to be recompiled to get the probing information. My main pain point in these scenarios was instead that memleak
doesn't provide the information needed to study memory-related bugs as a dev.
Hello BCC community!
I work on the development of a large complex application, and at points I've needed to analyze my application to make memory-related bug fixes or optimizations. I've tried to use a variant of the BCC
memleak
tool, but I found that it wasn't useful from a dev-oriented perspective where I'm trying to understand where memory allocations come from when performing a specific task in the application. Long story short, I've eventually developed a more in-depth memory analysis tool that helped greatly with my development work, and was wondering if the community had any interest in me contributing my tool to this repository?The reason why I found a need to make a different tool was as follows. My understanding was the
memleak
tool is best used for testing if a memory leak exists at all on a continuously running long-lived application, and that it also can be used to save some basic metrics about when and where the memory was leaked. In such a scenario, the exact timing of when the memory stats are tracked / reported are not too critical, and the tool can focus on just the largest individual allocations.When I try to study a specific action in an application, I end up having to use a stopwatch or a clock to time the periodic reporting so that I can get information about the action I initiate. If I'm interested in reducing the peak memory usage for an application that has no leaks, I needed the reporting to happen at some point in the middle of the workload. My actions always felt like a race against time because of the periodic dumping style user interface.
If there are many small allocated chunks from many stacks, the tool takes a long time to aggregate the results unless I enable eBPF-level stats aggregation (through the misleading
--combined-only
option), and when I do use this option I find that the tool gives inaccurate results for a multi-threaded application, causing me to hunt down phantom memory stacks. The "top-N allocations summary" output format also hasn't been very helpful in judging what functions in the application are creating the biggest memory impact, since a function with many small allocations is ranked very low even if the majority of memory utilization comes from that function, and also because this format emphasizes low-level functions near the top of the call stack rather than higher-level functions closer to the bottom. In other words, a lot of information is lost in the output format even if I ask for the top-1000 stacks.The version of the tool I made uses a terminal interface to let the user manually start and end profiling (something like the interface of
gdb
). It produces a top-down summary of allocations found during the profiling period, in a format similar toperf report
. It also supports outputting to a format compatible withflamegraph
. It uses the faster eBPF-level aggregation without the inaccuracies from thememleak
tool, and can detect if the maps were saturated (which causes the stats to be corrupted). My main concern is that the existing tools all seem to use the periodic dumping style and top-N style output formats, so this one would have a different style. Would it still fit in as a contribution to the community?