rust-lang / futures-rs

Zero-cost asynchronous programming in Rust
https://rust-lang.github.io/futures-rs/
Apache License 2.0
5.43k stars 629 forks source link

"malloc_consolidate(): unaligned fastbin chunk detected" when attempting to block single-threaded tokio runtime #2863

Closed nazar-pc closed 1 month ago

nazar-pc commented 5 months ago

Originally reported as https://github.com/tokio-rs/tokio/issues/6452, but I think this might be a futures problem actually.

I created a (non-minimal unfortunately) reproduction branch where last commit is showing an issue: https://github.com/subspace/subspace/tree/futures-malloc_consolidate-reproduction

On Linux it looks like this:

$ cd shared/subspace-cluster-networking
$ cargo t
   Compiling subspace-cluster-networking v0.1.0 (/web/subspace/subspace/shared/subspace-cluster-networking)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 1.67s
     Running unittests src/lib.rs (/home/nazar-pc/.cache/cargo/target/debug/deps/subspace_cluster_networking-c3975a08282e0a03)

running 2 tests
test request_response::tests::test_codec ... ok
malloc_consolidate(): unaligned fastbin chunk detected
error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/home/nazar-pc/.cache/cargo/target/debug/deps/subspace_cluster_networking-c3975a08282e0a03` (signal: 6, SIGABRT: process abort signal)

Here is a backtrace I collected originally:

``` running 2 tests [New Thread 0x7ffff7a00640 (LWP 368334)] [New Thread 0x7ffff7600640 (LWP 368335)] [Thread 0x7ffff7a00640 (LWP 368334) exited] test request_response::tests::test_codec ... ok [New Thread 0x7ffff7a00640 (LWP 368336)] malloc_consolidate(): unaligned fastbin chunk detected Thread 3 "tests::basic" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff7600640 (LWP 368335)] __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737343653440) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: Немає такого файла або каталогу. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737343653440) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737343653440) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737343653440, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff7c89676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ddbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007ffff7ca0cfc in malloc_printerr (str=str@entry=0x7ffff7dde5b8 "malloc_consolidate(): unaligned fastbin chunk detected") at ./malloc/malloc.c:5664 #7 0x00007ffff7ca198c in malloc_consolidate (av=av@entry=0x7ffff0000030) at ./malloc/malloc.c:4750 #8 0x00007ffff7ca2ea0 in _int_free (av=0x7ffff0000030, p=0x7ffff00558c0, have_lock=) at ./malloc/malloc.c:4674 #9 0x00007ffff7ca5453 in __GI___libc_free (mem=) at ./malloc/malloc.c:3391 #10 0x000055555560aa90 in alloc::alloc::dealloc () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/alloc/src/alloc.rs:119 #11 alloc::alloc::{impl#1}::deallocate (self=0x7ffff0055640, ptr=..., layout=...) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/alloc/src/alloc.rs:256 #12 0x0000555555794819 in alloc::boxed::{impl#8}::drop (self=0x7ffff0055638) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/alloc/src/boxed.rs:1243 #13 0x0000555555779d9e in core::ptr::drop_in_place> () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/ptr/mod.rs:514 #14 0x000055555577b4aa in core::ptr::drop_in_place>> () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/ptr/mod.rs:514 #15 0x000055555577de1f in core::ptr::drop_in_place>>> () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/ptr/mod.rs:514 #16 0x00005555557bf43b in tokio::runtime::task::core::{impl#6}::set_stage::{closure#0}>, alloc::sync::Arc> (ptr=0x7ffff0055630) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:382 #17 0x00005555557bc5f1 in tokio::loom::std::unsafe_cell::UnsafeCell>>>::with_mut>>, (), tokio::runtime::task::core::{impl#6}::set_stage::{closure_env#0}>, alloc::sync::Arc>> (self=0x7ffff0055630, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16 #18 tokio::runtime::task::core::Core>, alloc::sync::Arc>::set_stage>, alloc::sync::Arc> (self=0x7ffff0055620, stage=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:382 #19 0x00005555557b4d59 in tokio::runtime::task::core::Core>, alloc::sync::Arc>::drop_future_or_output>, alloc::sync::Arc> (self=0x7ffff0055620) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:347 #20 0x00005555557703ad in tokio::runtime::task::harness::poll_future::{closure#0}::{impl#0}::drop>, alloc::sync::Arc> (self=0x7ffff75fdde0) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:481 #21 0x00005555557871ba in core::ptr::drop_in_place>, alloc::sync::Arc>> () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/ptr/mod.rs:514 #22 0x000055555583143e in tokio::runtime::task::harness::poll_future::{closure#0}>, alloc::sync::Arc> () at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:488 #23 0x00005555555ffcd8 in core::panic::unwind_safe::{impl#23}::call_once, tokio::runtime::task::harness::poll_future::{closure_env#0}>, alloc::sync::Arc>> (self=...) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/panic/unwind_safe.rs:272 #24 0x0000555555737651 in std::panicking::try::do_call>, alloc::sync::Arc>>, core::task::poll::Poll<()>> (data=0x7ffff75fded8) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/std/src/panicking.rs:552 #25 0x000055555573d37b in __rust_try () #26 0x0000555555733d48 in std::panicking::try, core::panic::unwind_safe::AssertUnwindSafe>, alloc::sync::Arc>>> (f=...) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/std/src/panicking.rs:516 #27 0x0000555555650c8d in std::panic::catch_unwind>, alloc::sync::Arc>>, core::task::poll::Poll<()>> ( f=) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/std/src/panic.rs:149 #28 0x00005555558302b0 in tokio::runtime::task::harness::poll_future>, alloc::sync::Arc> (core=0x7ffff0055620, cx=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473 #29 0x000055555583440b in tokio::runtime::task::harness::Harness>, alloc::sync::Arc>::poll_inner>, alloc::sync::Arc> (self=0x7ffff75fe100) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208 #30 0x0000555555839683 in tokio::runtime::task::harness::Harness>, alloc::sync::Arc>::poll>, alloc::sync::Arc> (self=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153 #31 0x00005555557cf07b in tokio::runtime::task::raw::poll>, alloc::sync::Arc> (ptr=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:271 #32 0x0000555555b99ee7 in tokio::runtime::task::raw::RawTask::poll (self=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201 #33 0x0000555555a11a72 in tokio::runtime::task::LocalNotified>::run> (self=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:416 #34 0x00005555558507db in tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure#1}>>> () at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:706 #35 0x000055555584f973 in tokio::runtime::coop::with_budget<(), tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#1}>>>> (budget=..., f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107 #36 tokio::runtime::coop::budget<(), tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#1}>>>> (f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73 #37 tokio::runtime::scheduler::current_thread::{impl#3}::run_task::{closure#0}<(), tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#1}>>>> () at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:343 #38 0x000055555584f606 in tokio::runtime::scheduler::current_thread::Context::enter<(), tokio::runtime::scheduler::current_thread::{impl#3}::run_task::{closure_env#0}<(), tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#1}>>>>> (self=0x7ffff75fe938, core=0x7ffff0006550, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:410 #39 0x000055555584f808 in tokio::runtime::scheduler::current_thread::Context::run_task<(), tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}::{closure_env#1}>>>> (self=0x7ffff75fe938, core=0x7ffff0006550, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:343 #40 0x0000555555850746 in tokio::runtime::scheduler::current_thread::{impl#8}::block_on::{closure#0}>>> (core=0x7ffff0006550, context=0x7ffff75fe938) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:705 #41 0x000055555584fe4b in tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure#0}>>>, core::option::Option<()>> () at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:743 #42 0x00005555556cd69d in tokio::runtime::context::scoped::Scoped::set>>>, core::option::Option<()>>, (alloc::boxed::Box, core::option::Option<()>)> (self=0x7ffff76004b8, t=0x7ffff75fe930, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/scoped.rs:40 #43 0x00005555557c05ab in tokio::runtime::context::set_scheduler::{closure#0}<(alloc::boxed::Box, core::option::Option<()>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}>>>, core::option::Option<()>>> (c=0x7ffff7600480) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176 #44 0x000055555573d982 in std::thread::local::LocalKey::try_with, core::option::Option<()>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}>>>, core::option::Option<()>>>, (alloc::boxed::Box, core::option::Option<()>)> (self=0x555556007d28, f=...) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/std/src/thread/local.rs:284 #45 0x000055555573d3aa in std::thread::local::LocalKey::with, core::option::Option<()>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}>>>, core::option::Option<()>>>, (alloc::boxed::Box, core::option::Option<()>)> (self=0x555556007d28, f=) at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/std/src/thread/local.rs:260 #46 0x00005555557c055f in tokio::runtime::context::set_scheduler<(alloc::boxed::Box, core::option::Option<()>), tokio::runtime::scheduler::current_thread::{impl#8}::enter::{closure_env#0}>>>, core::option::Option<()>>> ( v=0x7ffff75fe930, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176 #47 0x000055555584fbe0 in tokio::runtime::scheduler::current_thread::CoreGuard::enter>>>, core::option::Option<()>> (self=..., f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:743 #48 0x000055555584fe6d in tokio::runtime::scheduler::current_thread::CoreGuard::block_on>>> (self=, future=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:652 #49 0x000055555584c9ba in tokio::runtime::scheduler::current_thread::{impl#0}::block_on::{closure#0}>> (blocking=0x7ffff75fead0) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:175 #50 0x000055555564e0e7 in tokio::runtime::context::runtime::enter_runtime>>, ()> (handle=0x7ffff75ff340, allow_block_in_place=false, f=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65 #51 0x000055555584c8de in tokio::runtime::scheduler::current_thread::CurrentThread::block_on>> (self=0x7ffff75ff318, handle=0x7ffff75ff340, future=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:167 #52 0x0000555555657acd in tokio::runtime::runtime::Runtime::block_on>> (self=0x7ffff75ff310, future=...) at /home/nazar-pc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:348 #53 0x000055555575001c in subspace_cluster_networking::tests::basic () at shared/subspace-cluster-networking/src/tests.rs:155 #54 0x00005555556404a7 in subspace_cluster_networking::tests::basic::{closure#0} () at shared/subspace-cluster-networking/src/tests.rs:59 #55 0x0000555555772db6 in core::ops::function::FnOnce::call_once () at /rustc/1684a753dbca5d23b2e03568e6fbbb48eb72d0e6/library/core/src/ops/function.rs:250 #56 0x00005555558a79b2 in core::ops::function::FnOnce::call_once core::result::Result<(), alloc::string::String>, ()> () at library/core/src/ops/function.rs:250 #57 test::__rust_begin_short_backtrace, fn() -> core::result::Result<(), alloc::string::String>> () at library/test/src/lib.rs:621 #58 0x00005555558a7131 in test::run_test_in_process::{closure#0} () at library/test/src/lib.rs:644 #59 core::panic::unwind_safe::{impl#23}::call_once, test::run_test_in_process::{closure_env#0}> () at library/core/src/panic/unwind_safe.rs:272 #60 std::panicking::try::do_call, core::result::Result<(), alloc::string::String>> () at library/std/src/panicking.rs:552 #61 std::panicking::try, core::panic::unwind_safe::AssertUnwindSafe> () at library/std/src/panicking.rs:516 #62 std::panic::catch_unwind, core::result::Result<(), alloc::string::String>> () at library/std/src/panic.rs:149 #63 test::run_test_in_process () at library/test/src/lib.rs:644 #64 test::run_test::{closure#0} () at library/test/src/lib.rs:567 --Type for more, q to quit, c to continue without paging--c #65 0x000055555586ef84 in test::run_test::{closure#1} () at library/test/src/lib.rs:595 #66 std::sys_common::backtrace::__rust_begin_short_backtrace () at library/std/src/sys_common/backtrace.rs:155 #67 0x0000555555873a3b in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0} () at library/std/src/thread/mod.rs:523 #68 core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}> () at library/core/src/panic/unwind_safe.rs:272 #69 std::panicking::try::do_call>, ()> () at library/std/src/panicking.rs:552 #70 std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe>> () at library/std/src/panicking.rs:516 #71 std::panic::catch_unwind>, ()> () at library/std/src/panic.rs:149 #72 std::thread::{impl#0}::spawn_unchecked_::{closure#1} () at library/std/src/thread/mod.rs:522 #73 core::ops::function::FnOnce::call_once, ()> () at library/core/src/ops/function.rs:250 #74 0x0000555555da841b in alloc::boxed::{impl#48}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2018 #75 alloc::boxed::{impl#48}::call_once<(), alloc::boxed::Box, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2018 #76 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:108 #77 0x00007ffff7c94ac3 in start_thread (arg=) at ./nptl/pthread_create.c:442 #78 0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 ```

The change in last commit is actually incorrect and causes panic, specifically this code is not allowed to block in single-threaded tokio runtime:

/// Joins async join handle on drop
pub(crate) struct AsyncJoinOnDrop<T>(Option<Fuse<task::JoinHandle<T>>>);

impl<T> Drop for AsyncJoinOnDrop<T> {
    fn drop(&mut self) {
        let handle = self.0.take().expect("Always called exactly once; qed");
        if !handle.is_terminated() {
            task::block_in_place(move || {
                let _ = Handle::current().block_on(handle);
            });
        }
    }
}

I expected to see a panic, but getting memory issues instead.

As shown in https://github.com/tokio-rs/tokio/issues/6452#issuecomment-2031837214 the likely cause is futures crate, FuturesUnordered to be specific.

taiki-e commented 5 months ago

Maybe related: https://github.com/rust-lang/futures-rs/issues/2781

nazar-pc commented 5 months ago

Pushed another commit into above branch with a lot of code and dependencies removed, while still reproducing an issue.

There are 2 FuturesUnordered usages there in redials and inbound_requests, having just one of them no longer triggers an issue for some reason.

Also if replacing result_receiver.await.map_err(|_| ()) with this:

result_receiver.await.unwrap();
Ok(())

Then I'm getting following interesting error:

thread 'tests::basic' panicked at library/core/src/panicking.rs:223:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.

Which might be of interest as well.

nazar-pc commented 4 months ago

Tried again today and the error is slightly different:

tcache_thread_shutdown(): unaligned tcache chunk detected

I guess might be related to side-effects of incremental compilation.

nazar-pc commented 4 months ago

@taiki-e I was able to reduce it to this:

#[test]
fn basic() {
    use futures::stream::FuturesUnordered;
    use std::future::Future;
    use std::pin::Pin;
    use std::task::{Context, Poll};

    struct BadFuture;

    impl Drop for BadFuture {
        fn drop(&mut self) {
            panic!()
        }
    }

    impl Future for BadFuture {
        type Output = ();

        fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Self::Output> {
            Poll::Pending
        }
    }

    FuturesUnordered::default().push(BadFuture);
}

Note that for some reason it only segfaults in a test, not when used in fn main.

taiki-e commented 2 months ago

Thanks for the repro. If the problem is that we are not properly handling buggy futures that cause panic on drop, then all we may really be able to do is leak the rest on panic or convert panic to abort.

nazar-pc commented 2 months ago

I would have expected panic to be propagated further somehow, but I don't know the underlying reason why this is happening