Open liamkinne opened 2 months ago
That’s weird, however the idea with the new function signature allowing mutable outputs is that you allocate once outside of the for loop. If you check in the benches is how I’m doing.
If you’re up to, would be great to provide a bench for remap using also [flamegraph](https://github.com/flamegraph-rs/flamegraph)
or similar to understand better the memory usage pattern.
BTW, remap is not the fastest implementation for now, it needs some optimisations as eg. the interpolation function it´s computing everytime the offset indices of the tensor from where to interpolate which could be done as a pre-step by iterating row by row with step 1. A second one, would be using simd to compute the weighted interpolation.
I couldn't reproduce the growing memory usage issue while running for about 5 minutes. But, when testing with valgrind
, an unrelated leak is found in the correction_map
I used (included below). (No confirmed leaks within the original snippet).
The leak detected by valgrind is here:
https://github.com/kornia/kornia-rs/blob/a6526627799c21a23bfe94723f3203ee8d995748/crates/kornia-core/src/storage.rs#L60-L64
The issue is that the memory allocated in ptr
won't be freed when the Arc::new(Vec::<T>::with_capacity(len))
is freed.
@emilmgeorge, thanks for your investigation. It seems there’s definitely an issue with Buffer::from_custom_allocation or possibly the way I implemented it initially. I've opened a ticket to address this: kornia-rs/issues/127.
It might be worthwhile to revisit the Tensor constructor API and the use of the Allocator.
Here are my thoughts on the design:
There is some amount of memory being allocated during the
remap
function that isn't getting cleaned up.This is with the latest commit on main. So I imagine it's something to do with rayon or the new tensor implementation.
This simple program will slowly grow in memory usage: