tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
26.36k stars 2.42k forks source link

tokio::sync::Lock leaks memory #2237

Closed olix0r closed 4 years ago

olix0r commented 4 years ago

Version

0.1.22

Platform

Linux golly 5.3.0-7625-generic #27~1576774585~18.04~c7868f8-Ubuntu SMP Thu Dec 19 20:37:47 UTC  x86_64 x86_64 x86_64 GNU/Linux

Subcrates

tokio_sync

Description

Under concurrency, tokio_sync::Lock leaks memory when acquired locks are dropped. We have a reliable repro for this.

:; cargo build --release && valgrind --leak-check=full target/release/lockyleaks
    Finished release [optimized] target(s) in 0.02s
==16442== Memcheck, a memory error detector
==16442== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16442== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==16442== Command: target/release/lockyleaks
==16442==
==16442==
==16442== HEAP SUMMARY:
==16442==     in use at exit: 123,726,189 bytes in 1,100,504 blocks
==16442==   total heap usage: 2,222,200 allocs, 1,121,696 frees, 720,964,308 bytes allocated
==16442==
==16442== 48 bytes in 2 blocks are possibly lost in loss record 7 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x1289D2: tokio_threadpool::worker::Worker::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1163ED: tokio_timer::clock::clock::with_default (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1174E2: tokio::runtime::threadpool::builder::Builder::build::{{closure}} (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130BD8: std::thread::local::LocalKey<T>::with (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12BBA7: std::sys_common::backtrace::__rust_begin_short_backtrace (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1321DB: _ZN3std9panicking3try7do_call17h7aab2d4390bd4174E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x13278F: core::ops::function::FnOnce::call_once{{vtable-shim}} (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x13FE5E: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once (boxed.rs:1022)
==16442==    by 0x14822F: call_once<(),alloc::boxed::Box<FnOnce<()>>> (boxed.rs:1022)
==16442==    by 0x14822F: start_thread (thread.rs:13)
==16442==    by 0x14822F: std::sys::unix::thread::Thread::new::thread_start (thread.rs:80)
==16442==    by 0x524F6DA: start_thread (pthread_create.c:463)
==16442==
==16442== 23,144 bytes in 11 blocks are possibly lost in loss record 41 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x136A5A: crossbeam_epoch::internal::Local::register (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1381B4: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17h902033f5292db545E.llvm.635284287880880640 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x133E33: crossbeam_deque::Stealer<T>::steal_batch_and_pop (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x128C75: tokio_threadpool::worker::Worker::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1163ED: tokio_timer::clock::clock::with_default (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1174E2: tokio::runtime::threadpool::builder::Builder::build::{{closure}} (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130BD8: std::thread::local::LocalKey<T>::with (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12BBA7: std::sys_common::backtrace::__rust_begin_short_backtrace (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1321DB: _ZN3std9panicking3try7do_call17h7aab2d4390bd4174E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x13278F: core::ops::function::FnOnce::call_once{{vtable-shim}} (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==
==16442== 1,147,648 bytes in 8,966 blocks are possibly lost in loss record 46 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x13188E: <&tokio_threadpool::sender::Sender as tokio_executor::executor::Executor>::spawn (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x13170D: <tokio_threadpool::sender::Sender as tokio_executor::executor::Executor>::spawn (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1387EA: std::thread::local::LocalKey<T>::with (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x13869C: <tokio_executor::global::DefaultExecutor as tokio_executor::executor::Executor>::spawn (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x112FC6: tokio::executor::spawn (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x11406F: <futures::future::lazy::Lazy<F,R> as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130221: futures::task_impl::std::set (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12CAC5: futures::task_impl::Spawn<T>::poll_future_notify (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x132180: _ZN3std9panicking3try7do_call17h0321a6d6b023a5f6E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x1304E8: tokio_threadpool::task::Task::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==
==16442== 2,164,064 bytes in 19,322 blocks are possibly lost in loss record 47 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x1261D8: tokio_sync::semaphore::Permit::poll_acquire (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115D07: <lockyleaks::AcquireAndIdle as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115AB7: <lockyleaks::Loop as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130221: futures::task_impl::std::set (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12CAC5: futures::task_impl::Spawn<T>::poll_future_notify (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x132180: _ZN3std9panicking3try7do_call17h0321a6d6b023a5f6E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x1304E8: tokio_threadpool::task::Task::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1296B9: tokio_threadpool::worker::Worker::run_task (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x128D54: tokio_threadpool::worker::Worker::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1163ED: tokio_timer::clock::clock::with_default (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==
==16442== 2,994,208 (112 direct, 2,994,096 indirect) bytes in 1 blocks are definitely lost in loss record 49 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x1261D8: tokio_sync::semaphore::Permit::poll_acquire (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115D07: <lockyleaks::AcquireAndIdle as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115AB7: <lockyleaks::Loop as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130221: futures::task_impl::std::set (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12CAC5: futures::task_impl::Spawn<T>::poll_future_notify (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x132180: _ZN3std9panicking3try7do_call17h0321a6d6b023a5f6E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x1304E8: tokio_threadpool::task::Task::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1296B9: tokio_threadpool::worker::Worker::run_task (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x128B7F: tokio_threadpool::worker::Worker::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1163ED: tokio_timer::clock::clock::with_default (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==
==16442== 116,918,928 bytes in 1,043,919 blocks are possibly lost in loss record 50 of 50
==16442==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16442==    by 0x1261D8: tokio_sync::semaphore::Permit::poll_acquire (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115D07: <lockyleaks::AcquireAndIdle as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x115AB7: <lockyleaks::Loop as futures::future::Future>::poll (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x130221: futures::task_impl::std::set (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x12CAC5: futures::task_impl::Spawn<T>::poll_future_notify (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x132180: _ZN3std9panicking3try7do_call17h0321a6d6b023a5f6E.llvm.4420981283127027568 (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x148A99: __rust_maybe_catch_panic (lib.rs:78)
==16442==    by 0x1304E8: tokio_threadpool::task::Task::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1296B9: tokio_threadpool::worker::Worker::run_task (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x128B7F: tokio_threadpool::worker::Worker::run (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==    by 0x1163ED: tokio_timer::clock::clock::with_default (in /home/ver/b/lockyleaks/target/release/lockyleaks)
==16442==
==16442== LEAK SUMMARY:
==16442==    definitely lost: 112 bytes in 1 blocks
==16442==    indirectly lost: 2,994,096 bytes in 26,733 blocks
==16442==      possibly lost: 120,253,832 bytes in 1,072,220 blocks
==16442==    still reachable: 478,149 bytes in 1,550 blocks
==16442==         suppressed: 0 bytes in 0 blocks
==16442== Reachable blocks (those to which a pointer was found) are not shown.
==16442== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==16442==
==16442== For counts of detected and suppressed errors, rerun with: -v
==16442== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)

We believe that this leak may be present in 0.2.x versions as well, though. @hawkw is investigating.

hawkw commented 4 years ago

I've ported the repro to Tokio 0.2, and I cannot reproduce the leak.

Valgrind:

eliza@ares:~/lockyleaks$ git status
On branch eliza/0.2
Your branch is up-to-date with 'origin/eliza/0.2'.
nothing to commit, working tree clean
eliza@ares:~/lockyleaks$ valgrind --leak-check=full target/release/lockyleaks
==14193== Memcheck, a memory error detector
==14193== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==14193== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==14193== Command: target/release/lockyleaks
==14193==
==14193==
==14193== HEAP SUMMARY:
==14193==     in use at exit: 32 bytes in 1 blocks
==14193==   total heap usage: 2,201,504 allocs, 2,201,503 frees, 465,264,680 bytes allocated
==14193==
==14193== LEAK SUMMARY:
==14193==    definitely lost: 0 bytes in 0 blocks
==14193==    indirectly lost: 0 bytes in 0 blocks
==14193==      possibly lost: 0 bytes in 0 blocks
==14193==    still reachable: 32 bytes in 1 blocks
==14193==         suppressed: 0 bytes in 0 blocks
==14193== Reachable blocks (those to which a pointer was found) are not shown.
==14193== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==14193==
==14193== For counts of detected and suppressed errors, rerun with: -v
==14193== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

running the 0.1 repro (blue line is RSS): mem_plot_01 running the 0.2 repro (blue line is RSS): mem_plot_02_fixed

AFAICT, this issue only exists in 0.1. I suspect we can probably fix it pretty easily by backporting the 0.2 impl to 0.1

hawkw commented 4 years ago

I have a branch (https://github.com/tokio-rs/tokio/tree/eliza/0.2-semaphore) that backports the 0.2 semaphore implementation to tokio-sync from v0.1.x (by copying and pasting + fixing up API differences). Valgrind no longer reports any of the semaphore-related leaks (and I think the crossbeam-deque one may be a false positive/something that exists for all tokio 0.1 programs):

eliza@ares:~/lockyleaks$ valgrind --leak-check=full target/release/lockyleaks
==40575== Memcheck, a memory error detector
==40575== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==40575== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==40575== Command: target/release/lockyleaks
==40575==
==40575==
==40575== HEAP SUMMARY:
==40575==     in use at exit: 279,904 bytes in 199 blocks
==40575==   total heap usage: 2,215,615 allocs, 2,215,416 frees, 681,160,960 bytes allocated
==40575==
==40575== 69,432 bytes in 33 blocks are possibly lost in loss record 17 of 18
==40575==    at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==40575==    by 0x13FE0A: crossbeam_epoch::internal::Local::register::hf487475157c3d28f (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x141564: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17h902033f5292db545E.llvm.635284287880880640 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x13D1E3: crossbeam_deque::Stealer$LT$T$GT$::steal_batch_and_pop::he4c6fad24a4c4a1e (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x132025: tokio_threadpool::worker::Worker::run::ha87b7d0e3fcf61d4 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x11F57D: tokio_timer::clock::clock::with_default::hbbd61c3a192ca8a3 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x120EF2: tokio::runtime::threadpool::builder::Builder::build::_$u7b$$u7b$closure$u7d$$u7d$::hefd82a6e3b44000c (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x139F88: std::thread::local::LocalKey$LT$T$GT$::with::h47b196d4586ff021 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x134F57: std::sys_common::backtrace::__rust_begin_short_backtrace::h19e070dd689ab654 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x13B58B: _ZN3std9panicking3try7do_call17h7aab2d4390bd4174E.llvm.4420981283127027568 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==    by 0x151E49: __rust_maybe_catch_panic (lib.rs:78)
==40575==    by 0x13BB3F: core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h9d9d0735a48e18a1 (in /home/eliza/lockyleaks/target/release/lockyleaks)
==40575==
==40575== LEAK SUMMARY:
==40575==    definitely lost: 0 bytes in 0 blocks
==40575==    indirectly lost: 0 bytes in 0 blocks
==40575==      possibly lost: 69,432 bytes in 33 blocks
==40575==    still reachable: 210,472 bytes in 166 blocks
==40575==         suppressed: 0 bytes in 0 blocks
==40575== Reachable blocks (those to which a pointer was found) are not shown.
==40575== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==40575==
==40575== For counts of detected and suppressed errors, rerun with: -v
==40575== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

I'm surprised that the RSS graphs still look similar to the ones for the unpatched 0.1 repro, but that may be due to jemalloc; I'm inclined to treat valgrind as a source of truth.

@carllerche any thoughts? Shall I go ahead and open a PR against v0.1.x to backport the 0.2 semaphore impl?

hawkw commented 4 years ago

An update on this: we believe that an issue still exists in the tokio 0.2 version of Semaphore where the memory used by the semaphore grows when a clone is polled and then dropped without having acquired a permit. However, all allocated memory is released when the Semaphore itself is dropped, so valgrind doesn't report a leak. In the 0.1 semaphore, it's likely that allocations were actually lost (and not freed when dropping the semaphore), resulting in a leak detected by valgrind.