Open eduardobraun opened 3 years ago
I updated the wgpu dependency to use the master from github, and now its panicking at the StagingBelt::write_buffer()
thread 'main' panicked at 'Error in Buffer::get_mapped_range: buffer offset invalid: offset 68 must be multiple of 8', /home/peer/.cargo/git/checkouts/wgpu-rs-40ea39809c03c5d8/645dc90/src/backend/direct.rs:112:9
stack backtrace:
0: rust_begin_unwind
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
1: std::panicking::begin_panic_fmt
at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:435:5
2: wgpu::backend::direct::Context::handle_error_fatal
3: <wgpu::backend::direct::Context as wgpu::Context>::buffer_get_mapped_range
4: wgpu::BufferSlice::get_mapped_range_mut
5: wgpu::util::belt::StagingBelt::write_buffer
6: iced_wgpu::quad::Pipeline::draw
7: iced_wgpu::backend::Backend::flush
8: iced_wgpu::backend::Backend::draw
9: <iced_wgpu::window::compositor::Compositor as iced_graphics::window::compositor::Compositor>::draw
10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
11: winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::run
12: winit::platform_impl::platform::EventLoop<T>::run
13: winit::event_loop::EventLoop<T>::run
14: iced_winit::application::run
15: game_of_life::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Memory block wasn't deallocated
Memory block wasn't deallocated
Memory block wasn't deallocated
[1] 28053 segmentation fault (core dumped) RUST_BACKTRACE=1 target/release/game_of_life
my guess is that if its not a multiple of 8 it was leaking? No idea.
Changing versions of wgpu is not a simple task. this PR is what was required just to switch between release versions
Well, maybe it will not work, but the tests are passing. And my point is that now its panicking on a function related to buffers, saying that it must be a multiple of 8. No one adds a fatal check if it is not important.
Can you try running the example with Iced when using WGPU 0.5? The commit hash for that should be fb015a85d22a7c4632bd251127a89259bfd0c346
. I've noticed that memory usage in general has increased substantially ever since upgrading from 0.5.
Running `target/release/game_of_life`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', /home/peer/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.5.6/src/device/mod.rs:324:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Same issue
Hm, I only encountered that issue when minimizing the window on Windows. Sounds like it's related to https://github.com/hecrj/iced/issues/551 & https://github.com/hecrj/iced/issues/603#issuecomment-727520902
Well, I use Sway as window manager, technically my window is always the same size. And the memory usage only rises when I actively draw quads on the screen,, Its weird that its the virtual memory of the application that increases but every GB of the virtual memory is taken from my main memory.
So I did a profile with valgrind this time
Been able to "fix" the issue by replacing the staging belt clear with a new one:
// Recall staging buffers
// self.local_pool
// .spawner()
// .spawn(self.staging_belt.recall())
// .expect("Recall staging belt");
// self.local_pool.run_until_stalled();
self.staging_belt = wgpu::util::StagingBelt::new(Self::CHUNK_SIZE);
in wgpu/src/window/compositor.rs
also tried using async-std::block_on
instead of the local_pool
:
async_std::task::block_on(async {
self.staging_belt.recall().await;
});
but it seems that the recall is blocking forever.
This is very interesting, I will try to use test this for the scrollable as well.
Do you mind sharing the dependencies in your Caro.toml?
The issue happens out of the box on the game of life example, currently I changed from tokio to async-std as I was testing..
[dependencies]
iced = { path = "../..", features = ["canvas", "async-std"] }
# tokio = { version = "1.0", features = ["sync"] }
async-std = "1.0"
version = "1.0"
itertools = "0.9"
rustc-hash = "1.1"
I can reproduce when using iced_wgpu
.
However, the iced_glow
backend is unaffected. In that case, the game_of_life
example takes only around 30-40 MB. Therefore, I believe everything seems to point towards an issue with the StagingBelt
in wgpu
.
Next step would be to create a SSCCE only using wgpu
and report the issue to the authors.
I gave another look at the issue,
After adding a debug function at wgpu::util::StagingBelt
:
impl StagingBelt {
pub fn print(&self) {
println!("Active: {} Closed: {} Free: {}", self.active_chunks.len(), self.closed_chunks.len(), self.free_chunks.len());
}
...
}
I got the following:
Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12447
Draws: 15594 Recalls: 15593
Draw
Active: 2 Closed: 0 Free: 12445
Finish
Active: 0 Closed: 2 Free: 12445
Submit
Active: 0 Closed: 2 Free: 12445
Recall
Active: 0 Closed: 0 Free: 12451
Draws: 15595 Recalls: 15593
The StagingBelt keeps allocating new chunks even having many free chunks.
So looking at the allocation logic in src/util/belt.rs+97
pub fn write_buffer(
&mut self,
encoder: &mut CommandEncoder,
target: &Buffer,
offset: BufferAddress,
size: BufferSize,
device: &Device,
) -> BufferViewMut {
let mut chunk = if let Some(index) = self
.active_chunks
.iter()
.position(|chunk| chunk.offset + size.get() <= chunk.size)
{
self.active_chunks.swap_remove(index)
} else if let Some(index) = self
.free_chunks
.iter()
.position(|chunk| size.get() <= chunk.size)
{
self.free_chunks.swap_remove(index)
} else {
let size = self.chunk_size.max(size.get());
Chunk {
buffer: device.create_buffer(&BufferDescriptor {
label: Some("staging"),
size,
usage: BufferUsage::MAP_WRITE | BufferUsage::COPY_SRC,
mapped_at_creation: true,
}),
size,
offset: 0,
}
};
...
}
We see that it will only use a free chunk if its size is less or equal any of the free chunks. Since the amount of vertices and index in the game of life canvas only go up if you add more geometry, it will never reuse old chunks and it has no way to release the free chunks.
After adding a naive garbage collector to the StagingBelt:
pub fn tick(&mut self) {
self.tick += 1;
}
pub fn gc(&mut self) {
let t = self.tick;
self.free_chunks.retain(|chunk| {(t - chunk.tick) < 5});
}
The memory usage stays constant at ~2.8GB (virtual memory).
When running the Game of Life example, both in debug and release mode, I experience jumps in memory usage while painting some quads in the canvas. As you can see in the image, with just a few quads its using 14GB of ram, about 12GB are from the example.
The problem seems to be with the growing of vectors, since the memory usage doubles, I did not test it extensively. But since the program uses near to no memory unless I draw quads, I did some bad math assuming no reuse of vertex when indexing: vertex: 16 bytes per quad indices: 24 bytes per quad memory: 1073741824 bytes in one GB * 12 total: memory/quadmem = 322322122547.2 quads I did not count the amount of quads on the screen, but its no where near that. For reference the amount of pixels on a 4K screen is 8294400.
Is that a problem with the example implementation or is that amount of memory consumption something I should expect when using the iced canvas?
Edit: I tried to profile the example and the results are closer to what I expected: But, the overall system memory usage still goes beyond 12GB. I have no clue what is going on.