vulkano-rs / vulkano

Safe and rich Rust wrapper around the Vulkan API
Apache License 2.0
4.52k stars 435 forks source link

"Interactive Fractal" example is leaking memory on Windows #1785

Closed Mesoptier closed 1 year ago

Mesoptier commented 2 years ago

Template

Issue

I'm running cargo run --bin interactive_fractal and seeing the used memory rise at around 2.7MB/s in the task manager (~27 MB in ~10 seconds). The same occurs if I run with the --release flag, but all of the following information is without it.

I used the Windows Performance Recorder program to record about 20 seconds of the example running and you can clearly see the memory rising at a regular rate. All these VirtualAlloc commits seem to stay alive until the program is manually terminated after ~24 seconds.

image

When looking at the rows in the table, I'm noticing a lot (90%+) of commits have the exact same size (0.262 MB):

image Here's the contents of the table in the above screenshot: virtualalloc commit lifetimes.txt (666KB)

The commit stack for each of these is exactly the same:

[Root]
ntdll.dll!RtlUserThreadStart
kernel32.dll!BaseThreadInitThunk
interactive_fractal.exe!__scrt_common_main_seh
interactive_fractal.exe!main
interactive_fractal.exe!std::rt::lang_start<tuple$<> >
interactive_fractal.exe!std::rt::lang_start_internal
interactive_fractal.exe!std::rt::lang_start::closure$0<tuple$<> >
interactive_fractal.exe!std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >
interactive_fractal.exe!core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >
interactive_fractal.exe!interactive_fractal::main
interactive_fractal.exe!interactive_fractal::compute_then_render
interactive_fractal.exe!interactive_fractal::app::FractalApp::compute
interactive_fractal.exe!interactive_fractal::fractal_compute_pipeline::FractalComputePipeline::compute
interactive_fractal.exe!vulkano::command_buffer::auto::AutoCommandBufferBuilder<vulkano::command_buffer::auto::PrimaryAutoCommandBuffer<vulkano::command_buffer::pool::standard::StandardCommandPoolAlloc>,vulkano::command_buffer::pool::standard::StandardCommandPoolBuilder>::build<vulkano::command_buffer::pool::standard::StandardCommandPoolBuilder>
interactive_fractal.exe!vulkano::command_buffer::synced::builder::SyncCommandBufferBuilder::build
interactive_fractal.exe!vulkano::command_buffer::synced::builder::commands::impl$0::bind_pipeline_compute::impl$0::send
interactive_fractal.exe!vulkano::command_buffer::sys::UnsafeCommandBufferBuilder::bind_pipeline_compute
interactive_fractal.exe!ash::vk::features::DeviceFnV1_0::cmd_bind_pipeline
nvoglv64.dll!<PDB not found>
nvoglv64.dll!<PDB not found>
nvoglv64.dll!<PDB not found>
KernelBase.dll!GlobalAlloc
ntdll.dll!RtlpAllocateHeapInternal
ntdll.dll!RtlpLowFragHeapAllocFromContext
ntdll.dll!RtlpAllocateUserBlock
ntdll.dll!RtlpAllocateUserBlockFromHeap
ntdll.dll!RtlpAllocateHeapInternal
ntdll.dll!RtlpAllocateHeap
ntdll.dll!RtlpExtendHeap
ntdll.dll!RtlpFindAndCommitPages
ntdll.dll!NtAllocateVirtualMemory
ntoskrnl.exe!KiSystemServiceCopyEnd
ntoskrnl.exe!NtAllocateVirtualMemory
ntoskrnl.exe!MiAllocateVirtualMemory

If I'm reading this correctly cmd_bind_pipeline calls into nvoglv64.dll (NVIDIA dll?), which does some memory allocation for some reason, but this doesn't get cleaned up until the program exits.


Context: I'm writing my own program based on this specific example (I'm toying with ray marching) and am experiencing the exact same issue there, including the repeated allocations of exactly 0.262MB. In that program I am limiting the FPS to 60, which reduces the amount of memory allocated per second from ~2.7MB/s to ~0.2MB/s. If I disable the frame rate limiting, the memory usages also starts rising at a similar rate to the interactive fractal example.


I really hope this is enough information for someone with more experience with Vulkano/Vulkan (or graphics programming in general... or memory leaks... I'm not in my element here :P) to figure out what is happening here. If more information is required, I'm happy to spend more time on this!

Rua commented 2 years ago

It's hard to say whether this is a problem in Vulkano or your graphics driver. Are you able to test it with another driver, and a non-Vulkano program that uses Vulkan?

Mesoptier commented 2 years ago

It's hard to say whether this is a problem in Vulkano or your graphics driver. Are you able to test it with another driver, and a non-Vulkano program that uses Vulkan?

I'm not sure about another graphics driver. I know there's Nouveau for Ubuntu, but are there alternatives for Windows?

I will have a look for a similar Vulkan program that doesn't use Vulkano.

Mesoptier commented 2 years ago

I have cloned this repository, containing many Vulkan examples. I've tried a bunch of these examples, and none of them seem to leak any memory. In particular I tried the computeraytracing example, which I felt had the most similarity to the interactive_fractal example, due to its use of a compute shader for computing the ray tracing.

I've also ran all the Vulkano examples that use an EventLoop (so they stay alive long enough to observe the memory leak):

So, only the indirect and interactive_fractal examples seem to leak memory for me. If I'm not mistaken, those are also exactly the only ones that use a compute shader, further suggesting the problem is in that area.

Rua commented 2 years ago

cmd_bind_pipeline is a Vulkan API call; it is just the Ash binding for the vkCmdBindPipeline library function. So if the memory allocation is happening within that function (is it?) then it would seem that the driver is the problem. On the other hand, it may be behaving badly because of an error elsewhere in Vulkano that gives the driver incorrect data. Very hard to track down...

Rua commented 2 years ago

While I'm also not experienced with leaks, I ran the following and found no leaks: valgrind --tool=memcheck --leak-check=yes cargo run --bin interactive_fractal

Vulkano: master OS: Linux Mint 20.2 GPU: AMD Radeon RX 580 Driver: Mesa/RADV 21.3.2 (Vulkan 1.2.195)

Mesoptier commented 2 years ago

When I ran the example on my Ubuntu machine (a laptop without dedicated graphics card), I also didn't perceive any memory leaks. So that would lead me to be inclined to also think it's just a driver issue, except that the non-Vulkano Vulkan examples did run without memory leaks...

I certainly agree that it's very hard to track down. Do you have any other ideas for what I might try to get this figured out?

Rua commented 2 years ago

I asked in a Rust group if anyone else can try it out and see if they can reproduce.

ryco117 commented 2 years ago

I have cloned this repository, containing many Vulkan examples. I've tried a bunch of these examples, and none of them seem to leak any memory. In particular I tried the computeraytracing example, which I felt had the most similarity to the interactive_fractal example, due to its use of a compute shader for computing the ray tracing.

I've also ran all the Vulkano examples that use an EventLoop (so they stay alive long enough to observe the memory leak):

  • buffer-pool - no leak
  • clear_attachments - no leak
  • gl-interop - (did not run)
  • indirect - LEAK!
  • instancing - no leak
  • multi-window - no leak
  • occlusion-query - no leak
  • tessellation - no leak
  • triangle - no leak
  • deferred - no leak
  • image - no leak
  • immutable-sampler - no leak
  • interactive_fractal - LEAK!
  • push-descriptors - no leak
  • runtime-shader - (did not run)
  • runtime_array - no leak
  • teapot - no leak

So, only the indirect and interactive_fractal examples seem to leak memory for me. If I'm not mistaken, those are also exactly the only ones that use a compute shader, further suggesting the problem is in that area.

I have also been experiencing interesting memory leak issues running on Windows, and it seems that I am encountering the same tests leaking and not leaking on my machine, with the inclusion of multi_window_game_of_life also leaking for me. I suppose the unifying theme is the inclusion of a compute shader in the rendering process of leaking examples.

In my own experimenting, I've been able to get compute and rendering to work without leaking memory each frame by following the template set in https://vulkano.rs/guide/windowing/event-handling ; specifically, each frame in the swapchain is given a unique future in an array and synchronization is based around this. I haven't been able to identify exactly why this prevents memory leakage, or what is leaking in the examples, but my current suspicions are something along the lines of if a future is dropped before completing, it is losing track of resources, based partly on the phrasing of "If possible, checks whether the submission has finished. If so, gives up ownership of the resources used by these submissions." (emphasis is mine, https://docs.rs/vulkano/0.29.0/vulkano/sync/trait.GpuFuture.html#tymethod.cleanup_finished ).

I'll keep investigating, but thought I'd share what I've learned so far.

trevex commented 1 year ago

Tried to reproduce on Windows 10 with NV 1080 Ti (driver 527.56), but on my machine using https://github.com/vulkano-rs/vulkano/commit/10d734955633aad8fe816d5cd12e6f3728749539 I do not experience a memory leak running either interactive_fractal or indirect.

However the theory by @ryco117 sounds interesting to me and could imply that leaks could happen due to race conditions. @ryco117 do you still experience the leaks using current master?

ryco117 commented 1 year ago

@trevex I did not experience a memory leak on either example when run in both debug and release modes. I think it is fair to say that the recent refactors and cleanup work may be helping 🙂

Mesoptier commented 1 year ago

I just tested the latest master on my machine and I no longer experienced any memory leaks in the interactive_fractal example!

I did a quick git bisect and found that the memory leak was fixed in 91dc54413507511bbb9df260ea9b984a5b1dcb67 (nice work @marc0246!)