Memory leak in `remap` - Githubissues

liamkinne commented 2 months ago

There is some amount of memory being allocated during the remap function that isn't getting cleaned up.

This is with the latest commit on main. So I imagine it's something to do with rayon or the new tensor implementation.

This simple program will slowly grow in memory usage:

fn main() {
    let size = ImageSize {
        width: 1440,
        height: 1440,
    };

    let correction_map = correction_map(&size);

    loop {
        let input = Image::<u8, 3>::from_size_val(size, 0).unwrap();
        let mut output = Image::<f32, 3>::from_size_val(size, 0.0).unwrap();

        remap(
            &input.cast().unwrap(),
            &mut output,
            &correction_map.0,
            &correction_map.1,
            InterpolationMode::Bilinear,
        )
        .unwrap();
    }
}

edgarriba commented 2 months ago

That’s weird, however the idea with the new function signature allowing mutable outputs is that you allocate once outside of the for loop. If you check in the benches is how I’m doing.

If you’re up to, would be great to provide a bench for remap using also [flamegraph](https://github.com/flamegraph-rs/flamegraph) or similar to understand better the memory usage pattern.

BTW, remap is not the fastest implementation for now, it needs some optimisations as eg. the interpolation function it´s computing everytime the offset indices of the tensor from where to interpolate which could be done as a pre-step by iterating row by row with step 1. A second one, would be using simd to compute the weighted interpolation.

emilmgeorge commented 1 month ago

I couldn't reproduce the growing memory usage issue while running for about 5 minutes. But, when testing with valgrind, an unrelated leak is found in the correction_map I used (included below). (No confirmed leaks within the original snippet).

This is the correction_map that I used.

```rust use kornia::image::{Image, ImageSize}; use kornia::core::Tensor2; use kornia::imgproc::interpolation::{remap, InterpolationMode, grid::meshgrid_from_fn}; fn correction_map(size: &ImageSize) -> (Tensor2, Tensor2) { // Identity mapping let (map_x, map_y) = meshgrid_from_fn(size.width, size.height, |x, y| { Ok((x as f32, y as f32)) }).unwrap(); (map_x, map_y) } ```

Valgrind log

```valgrind ==419002== Memcheck, a memory error detector ==419002== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==419002== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info ==419002== Command: target/debug/mem_test ==419002== Parent PID: 418734 ==419002== ==419002== ==419002== HEAP SUMMARY: ==419002== in use at exit: 16,653,640 bytes in 129 blocks ==419002== total heap usage: 230 allocs, 101 frees, 229,673,704 bytes allocated ==419002== ==419002== 2,432 bytes in 8 blocks are possibly lost in loss record 16 of 23 ==419002== at 0x484BC13: calloc (vg_replace_malloc.c:1675) ==419002== by 0x4011003: calloc (rtld-malloc.h:44) ==419002== by 0x4011003: allocate_dtv (dl-tls.c:395) ==419002== by 0x4011AE1: _dl_allocate_tls (dl-tls.c:664) ==419002== by 0x4A44F14: allocate_stack (allocatestack.c:429) ==419002== by 0x4A44F14: pthread_create@@GLIBC_2.34 (pthread_create.c:655) ==419002== by 0x18AA61: std::sys::pal::unix::thread::Thread::new (thread.rs:87) ==419002== by 0x14F581: std::thread::Builder::spawn_unchecked_ (mod.rs:577) ==419002== by 0x14EA5E: std::thread::Builder::spawn_unchecked (mod.rs:456) ==419002== by 0x15013D: std::thread::Builder::spawn (mod.rs:388) ==419002== by 0x13D92E: ::spawn (registry.rs:98) ==419002== by 0x13EDFD: rayon_core::registry::Registry::new (registry.rs:304) ==419002== by 0x13DD5B: rayon_core::registry::default_global_registry (registry.rs:201) ==419002== by 0x13B9E0: core::ops::function::FnOnce::call_once (function.rs:250) ==419002== ==419002== 8,294,400 bytes in 1 blocks are definitely lost in loss record 22 of 23 ==419002== at 0x48447A8: malloc (vg_replace_malloc.c:446) ==419002== by 0x162139: std::sys::pal::unix::alloc::::alloc (alloc.rs:14) ==419002== by 0x161F3E: ::alloc (allocator.rs:58) ==419002== by 0x12FA86: kornia_core::storage::TensorStorage::new (storage.rs:55) ==419002== by 0x12F817: kornia_core::tensor::Tensor::new_uninitialized (tensor.rs:112) ==419002== by 0x121EE3: kornia_imgproc::interpolation::grid::meshgrid_from_fn (grid.rs:43) ==419002== by 0x11A3FF: mem_test::correction_map (main.rs:7) ==419002== by 0x11A567: mem_test::main (main.rs:19) ==419002== by 0x11E1FA: core::ops::function::FnOnce::call_once (function.rs:250) ==419002== by 0x1241ED: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152) ==419002== by 0x1253D0: std::rt::lang_start::{{closure}} (rt.rs:162) ==419002== by 0x18093F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284) ==419002== by 0x18093F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:557) ==419002== by 0x18093F: try + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:521) ==419002== by 0x18093F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:350) ==419002== by 0x18093F: {closure#2} (rt.rs:141) ==419002== by 0x18093F: do_call (panicking.rs:557) ==419002== by 0x18093F: try (panicking.rs:521) ==419002== by 0x18093F: catch_unwind (panic.rs:350) ==419002== by 0x18093F: std::rt::lang_start_internal (rt.rs:141) ==419002== ==419002== 8,294,400 bytes in 1 blocks are definitely lost in loss record 23 of 23 ==419002== at 0x48447A8: malloc (vg_replace_malloc.c:446) ==419002== by 0x162139: std::sys::pal::unix::alloc::::alloc (alloc.rs:14) ==419002== by 0x161F3E: ::alloc (allocator.rs:58) ==419002== by 0x12FA86: kornia_core::storage::TensorStorage::new (storage.rs:55) ==419002== by 0x12F817: kornia_core::tensor::Tensor::new_uninitialized (tensor.rs:112) ==419002== by 0x121FEE: kornia_imgproc::interpolation::grid::meshgrid_from_fn (grid.rs:44) ==419002== by 0x11A3FF: mem_test::correction_map (main.rs:7) ==419002== by 0x11A567: mem_test::main (main.rs:19) ==419002== by 0x11E1FA: core::ops::function::FnOnce::call_once (function.rs:250) ==419002== by 0x1241ED: std::sys::backtrace::__rust_begin_short_backtrace (backtrace.rs:152) ==419002== by 0x1253D0: std::rt::lang_start::{{closure}} (rt.rs:162) ==419002== by 0x18093F: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:284) ==419002== by 0x18093F: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:557) ==419002== by 0x18093F: try + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:521) ==419002== by 0x18093F: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:350) ==419002== by 0x18093F: {closure#2} (rt.rs:141) ==419002== by 0x18093F: do_call (panicking.rs:557) ==419002== by 0x18093F: try (panicking.rs:521) ==419002== by 0x18093F: catch_unwind (panic.rs:350) ==419002== by 0x18093F: std::rt::lang_start_internal (rt.rs:141) ==419002== ==419002== LEAK SUMMARY: ==419002== definitely lost: 16,588,800 bytes in 2 blocks ==419002== indirectly lost: 0 bytes in 0 blocks ==419002== possibly lost: 2,432 bytes in 8 blocks ==419002== still reachable: 62,408 bytes in 119 blocks ==419002== suppressed: 0 bytes in 0 blocks ==419002== Reachable blocks (those to which a pointer was found) are not shown. ==419002== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==419002== ==419002== For lists of detected and suppressed errors, rerun with: -s ==419002== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0) ```

The leak detected by valgrind is here: https://github.com/kornia/kornia-rs/blob/a6526627799c21a23bfe94723f3203ee8d995748/crates/kornia-core/src/storage.rs#L60-L64 The issue is that the memory allocated in ptr won't be freed when the Arc::new(Vec::<T>::with_capacity(len)) is freed.

edgarriba commented 1 month ago

@emilmgeorge, thanks for your investigation. It seems there’s definitely an issue with Buffer::from_custom_allocation or possibly the way I implemented it initially. I've opened a ticket to address this: kornia-rs/issues/127.

It might be worthwhile to revisit the Tensor constructor API and the use of the Allocator.

Here are my thoughts on the design:

Make the Tensor struct extensible via Buffer::Allocation, allowing it to support data from various backends, such as CUDA, Wpu, etc.
For instance, one could implement a CudaAllocator and initialize a Tensor by allocating CUDA memory, or by using a pre-allocated CUDA pointer. This could be particularly useful for implementing image operations directly on CUDA, like a remap function.
Ideally, the kornia-core crate and the Tensor struct should be lightweight, acting primarily as a managed data container. This would keep it decoupled from operations and allow for interchangeable use with other library containers at zero copy cost.

kornia / kornia-rs

Memory leak in `remap` #134