Memory leak in Wgpu backend

tracel-ai / burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

https://burn.dev

Apache License 2.0

8.39k stars 410 forks source link

Memory leak in Wgpu backend #2042

Open joshhansen opened 1 month ago

joshhansen commented 1 month ago

Describe the bug Memory leaks when using the Wgpu backend, but not when using the NdArray backend.

To Reproduce A minimal reproduction is available at https://github.com/joshhansen/burnleak.

Check out the repository and cargo run --release on the master branch. Watch the memory usage. In my case, it climbs steadily upward at a rapid pace.

Then check out the master_ndarray branch and run it again. In my case, the memory usage does not climb.

Running on Wgpu but with the WgpuDevice::Cpu device slows but does not eliminate the memory leakage.

Expected behavior The program should run without a memory leak on the Wgpu backend like it does on the NdArray backend.

Screenshots n/a

Desktop:

OS: Linux Mint 21.3, kernel 6.5.0 x86_64
Burn: 0.13.2
Rust: 1.76.0
Nvidia driver 545.29.06-0ubuntu0.22.04.2

Additional context Heaptrack memory leak profile: heaptrack1

heaptrack2

kurtlawrence commented 1 month ago

I'm experiencing the same leak when I upgraded from burn 0.12 to burn 0.13.

I tried your repro repo using burn 0.12 as a dep and there is not an aggressive leak, so it seems like a regression.

This is what I saw with my personal hobby project monitoring

joshhansen commented 1 month ago

Thanks for the quick repro @kurtlawrence . I can confirm that downgrading to 0.12.1 results in much better memory usage - there are still leaks coming from wgpu but less by an order of magnitude. (82.6 MB leaks on 1.4 G of heaptrack samples vs. 2.1 GB of leaks on 626 MB of heaptrack samples)

mepatrick73 commented 1 month ago

I've been working on the memory management strategy currently implemented in the wgpu runtime on the master branch. The current approach results in higher average memory usage, which is intentional. The new strategy is designed to be lazier in freeing unused memory and more aggressive in reusing it. For dynamic memory workloads, this can lead to performance improvements of up to 60%.

Our assumption is that for most deep learning use cases, the GPU device will be primarily dedicated to the training or inference tasks being performed. Therefore, we've prioritized better performance at the cost of higher average memory usage. While we don't believe this strategy leads to significantly higher peak memory usage or more frequent out-of-memory situations, we recognize that this could be a potential issue.

If average memory usage is a concern, we could consider adding an option for users to configure the memory management behavior in the wgpu runtime.

nathanielsimard commented 1 month ago

@kurtlawrence Is it CPU RAM leakage or GPU memory?

kurtlawrence commented 1 month ago

@kurtlawrence Is it CPU RAM leakage or GPU memory?

CPU RAM leaking

joshhansen commented 1 month ago

@mepatrick73 The issue is not the level of memory usage, it's that the memory usage grows without bound. Eventually it would consume all system memory if I didn't terminate the program. I have 128GB system RAM and 4 GPUs with 48 GB VRAM. The task in my case is inference on a small model - each instance consumes 28 MiB of VRAM as per nvidia-smi.