Memory usage on the Game of Life example

eduardobraun commented 3 years ago

When running the Game of Life example, both in debug and release mode, I experience jumps in memory usage while painting some quads in the canvas. As you can see in the image, with just a few quads its using 14GB of ram, about 12GB are from the example.

The problem seems to be with the growing of vectors, since the memory usage doubles, I did not test it extensively. But since the program uses near to no memory unless I draw quads, I did some bad math assuming no reuse of vertex when indexing: vertex: 16 bytes per quad indices: 24 bytes per quad memory: 1073741824 bytes in one GB * 12 total: memory/quadmem = 322322122547.2 quads I did not count the amount of quads on the screen, but its no where near that. For reference the amount of pixels on a 4K screen is 8294400.

Is that a problem with the example implementation or is that amount of memory consumption something I should expect when using the iced canvas?

Edit: I tried to profile the example and the results are closer to what I expected: But, the overall system memory usage still goes beyond 12GB. I have no clue what is going on.

eduardobraun commented 3 years ago

I updated the wgpu dependency to use the master from github, and now its panicking at the StagingBelt::write_buffer()

thread 'main' panicked at 'Error in Buffer::get_mapped_range: buffer offset invalid: offset 68 must be multiple of 8', /home/peer/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/645dc90/src/backend/direct.rs:112:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
   1: std::panicking::begin_panic_fmt
             at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:435:5
   2: wgpu::backend::direct::Context::handle_error_fatal
   3: <wgpu::backend::direct::Context as wgpu::Context>::buffer_get_mapped_range
   4: wgpu::BufferSlice::get_mapped_range_mut
   5: wgpu::util::belt::StagingBelt::write_buffer
   6: iced_wgpu::quad::Pipeline::draw
   7: iced_wgpu::backend::Backend::flush
   8: iced_wgpu::backend::Backend::draw
   9: <iced_wgpu::window::compositor::Compositor as iced_graphics::window::compositor::Compositor>::draw
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  11: winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run
  12: winit::platform_impl::platform::EventLoop<T>::run
  13: winit::event_loop::EventLoop<T>::run
  14: iced_winit::application::run
  15: game_of_life::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Memory block wasn't deallocated
Memory block wasn't deallocated
Memory block wasn't deallocated
[1]    28053 segmentation fault (core dumped)  RUST_BACKTRACE=1 target/release/game_of_life

my guess is that if its not a multiple of 8 it was leaking? No idea.

13r0ck commented 3 years ago

Changing versions of wgpu is not a simple task. this PR is what was required just to switch between release versions

eduardobraun commented 3 years ago

Well, maybe it will not work, but the tests are passing. And my point is that now its panicking on a function related to buffers, saying that it must be a multiple of 8. No one adds a fatal check if it is not important.

Ssaely commented 3 years ago

Can you try running the example with Iced when using WGPU 0.5? The commit hash for that should be fb015a85d22a7c4632bd251127a89259bfd0c346. I've noticed that memory usage in general has increased substantially ever since upgrading from 0.5.

eduardobraun commented 3 years ago

     Running `target/release/game_of_life`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', /home/peer/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.5.6/src/device/mod.rs:324:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Same issue

Ssaely commented 3 years ago

Hm, I only encountered that issue when minimizing the window on Windows. Sounds like it's related to https://github.com/hecrj/iced/issues/551 & https://github.com/hecrj/iced/issues/603#issuecomment-727520902

eduardobraun commented 3 years ago

Well, I use Sway as window manager, technically my window is always the same size. And the memory usage only rises when I actively draw quads on the screen,, Its weird that its the virtual memory of the application that increases but every GB of the virtual memory is taken from my main memory.

eduardobraun commented 3 years ago

So I did a profile with valgrind this time

eduardobraun commented 3 years ago

Been able to "fix" the issue by replacing the staging belt clear with a new one:

        // Recall staging buffers
        // self.local_pool
        //     .spawner()
        //     .spawn(self.staging_belt.recall())
        //     .expect("Recall staging belt");

        // self.local_pool.run_until_stalled();
        self.staging_belt = wgpu::util::StagingBelt::new(Self::CHUNK_SIZE);

in wgpu/src/window/compositor.rs

also tried using async-std::block_on instead of the local_pool:

        async_std::task::block_on(async {
            self.staging_belt.recall().await;
        });

but it seems that the recall is blocking forever.

GiulianoCTRL commented 3 years ago

This is very interesting, I will try to use test this for the scrollable as well.

Do you mind sharing the dependencies in your Caro.toml?

eduardobraun commented 3 years ago

The issue happens out of the box on the game of life example, currently I changed from tokio to async-std as I was testing..

[dependencies]
iced = { path = "../..", features = ["canvas", "async-std"] }
# tokio = { version = "1.0", features = ["sync"] }
async-std = "1.0"
version = "1.0"
itertools = "0.9"
rustc-hash = "1.1"

hecrj commented 3 years ago

I can reproduce when using iced_wgpu.

However, the iced_glow backend is unaffected. In that case, the game_of_life example takes only around 30-40 MB. Therefore, I believe everything seems to point towards an issue with the StagingBelt in wgpu.

Next step would be to create a SSCCE only using wgpu and report the issue to the authors.

eduardobraun commented 3 years ago

I gave another look at the issue, After adding a debug function at wgpu::util::StagingBelt:

impl StagingBelt {
    pub fn print(&self) {
        println!("Active: {} Closed: {} Free: {}", self.active_chunks.len(), self.closed_chunks.len(), self.free_chunks.len());
    }
...
}

I got the following:

Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12447
Draws: 15594 Recalls: 15593
Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12451
Draws: 15595 Recalls: 15593

The StagingBelt keeps allocating new chunks even having many free chunks. So looking at the allocation logic in src/util/belt.rs+97

    pub fn write_buffer(
        &mut self,
        encoder: &mut CommandEncoder,
        target: &Buffer,
        offset: BufferAddress,
        size: BufferSize,
        device: &Device,
    ) -> BufferViewMut {

        let mut chunk = if let Some(index) = self
            .active_chunks
            .iter()
            .position(|chunk| chunk.offset + size.get() <= chunk.size)
        {
            self.active_chunks.swap_remove(index)
        } else if let Some(index) = self
            .free_chunks
            .iter()
            .position(|chunk| size.get() <= chunk.size)
        {
            self.free_chunks.swap_remove(index)
        } else {
            let size = self.chunk_size.max(size.get());
            Chunk {
                buffer: device.create_buffer(&BufferDescriptor {
                    label: Some("staging"),
                    size,
                    usage: BufferUsage::MAP_WRITE | BufferUsage::COPY_SRC,
                    mapped_at_creation: true,
                }),
                size,
                offset: 0,
            }
        };
...
}

We see that it will only use a free chunk if its size is less or equal any of the free chunks. Since the amount of vertices and index in the game of life canvas only go up if you add more geometry, it will never reuse old chunks and it has no way to release the free chunks.

eduardobraun commented 3 years ago

After adding a naive garbage collector to the StagingBelt:

    pub fn tick(&mut self) {
        self.tick += 1;
    }

    pub fn gc(&mut self) {
        let t = self.tick;
        self.free_chunks.retain(|chunk| {(t - chunk.tick) < 5});
    }

The memory usage stays constant at ~2.8GB (virtual memory).

iced-rs / iced

Memory usage on the Game of Life example #786