square / leakcanary

A memory leak detection library for Android.
https://square.github.io/leakcanary
Apache License 2.0
29.35k stars 3.97k forks source link

Write a page on "research insights" #2061

Open pyricau opened 3 years ago

pyricau commented 3 years ago

This is a bit different from a users manual. The key idea is to write down fundamental discoveries we've made or leveraged in LeakCanary.

Random notes I took a while ago:

Memory leak A programming error that causes an application to keep a reference to an object that is no longer needed.

Leak root

Retained object An object that is strongly reachable but is expected to be at least weakly reachable.

Leak trace Best strong reference path from garbage collection roots to the retained object.

Detecting retained objects

We can detect leaks by identifying retained objects. These objects aren't the leaks, but they're symptoms of leaks. We can identify them in two ways:

1) At runtime by registering hooks for lifecycle events, e.g. the Activity.onDestroy() callback, and checking that specific objects become weakly reachable as expected. 2) At heap analysis time by looking at object state, e.g. Activity.mDestroyed = true

Locating retained objects

LeakCanary leverages the runtime hooks to know when to dump the heap, and then locates the weak references in the heap to find leaking objects. LeakCanary can also scan the entire heap looking for objects with a known state.

In debug, LeakCanary runs heap dumps when the number of retained objects reaches a threshold, and clears all weak refs after a heap dump to avoid reporting those leaks again, which would lead to noise as well as unecessary heap dumps. For that reason, in debug LeakCanary does not scan the heap looking for objects with a known bad state by default, because that would lead to double reporting over successive heap dumps.

Note: LeakCanary could probably enable looking for objects with a known bad state in debug and keep an in memory cache of known paths, but that could lead to eliminating new leaks with identical paths, which are currently reported (as not new). There might be a happy middle ground, this is mostly a UX problem.

Computing the leak trace

A leak is a programming error that causes an application to keep a reference to an object that is no longer needed. Identifying that reference helps

Once we've identified

Redux

BFS

Alphabetical order for stable results / path

pyricau commented 2 years ago

More high level ideas:

The Java heap is a directed graph: objects are nodes and references are directed edges. These edge / references can be strong, weak, etc. There are special nodes called GC roots which each hold a single reference. Any node not reachable by a path of only strong references from a gc root is garbage collectable.

pyricau commented 2 years ago

A memory leak is a bug where references are not properly updated, which causes objects to be strongly reacheable from gc roots (and therefore not garbage collectable) when they shouldn't.

Any such bad reference is called "leak". The objects are remain strongly reacheable only due to being directly or indirectly referenced by such a bad reference are "leaking". This is important, a "leak" is always a bad reference, never an object. And a single leak typically causes many objects to be "leaking".

Of course, a single bug can cause many bad references to be created, i.e. they're all the same leak. An an object can be "leaking" due to several entirely different leaks.

pyricau commented 2 years ago

Once a leak / bad reference is identified, we can easily find all the "leaking objects" i.e. all objects that be GCed if that ref is cleared (note: BLeaks paper claims that's less true for js vms because multiple bad refs are typically involved in a single leak). Typically that's how we compute the "impact" of a leak, i.e. the number of objects & sum of obj size (aka retained size) that would be gced if the ref was cleared.

We do this by computing a dominator tree.

pyricau commented 2 years ago

However, finding the bad ref is the hard part. There's nothing distinguishing it from any other good ref. However, we know that any "leaking" object would be GCed if the bad ref was cleared, and therefore that the bad ref has to be on all paths (all!) from gc roots to every single leaking object (note: define "gc roots" as a single root node referencing all gc roots)

pyricau commented 2 years ago

So finding a leak takes a few steps: first, finding leaking objects. Then, find a path (ie one, doesn't have to be all) from gc roots to leaking object. The bad ref is one of the refs of that path.

pyricau commented 2 years ago

Latest thoughts:

LeakCanary Internals

Start with definitions

Maybe worth defining watched vs retained vs leaking? if possible

Add a few notes pointing to lower bus factor + helping others onboard. Remind people to post about missing documentation.

Describe the high level components:

Object watching Heap Dumping (including trigger + impls + stripping) Heap Analysis Results presentation

Object Watching

API mention watched vs retained, the executor APi Implementation with weak ref queue Android automated detection

Heap Dumping

Triggers: manual (UI code), on screen off, on background, when count of retained instances reaches a threshold Mention particulars of Android impl: freezes the VM. Point to source file. Seems to sometimes return too early? Point to alternative implementation. Mention stripping, + removal tricks with deleteOnExit

HeapAnalysis

Parsing Heap Dump

Goal is to provide a graph API that can be traversed, with low / constant memory overhead and good performance on mobile. Graph API leverages sequences so that navigating the graph is lazy, which allows different impls. Key decision was to create an in memory representation of objects that has minimal info. Optimizatioins like keep clalsses around.

Finding retained instances (from state or watched objects) Finding paths from GC roots Deduplicating paths, keeping the shorter ones. Leveraging state to infer leak cause. + labeling nodes Signature + grouping by signature

Results presentation

The string representation and the UI