tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.17k stars 676 forks source link

Thread count negative overflow in tracing_subscriber use of sharded-slab #2856

Open PPakalns opened 5 months ago

PPakalns commented 5 months ago

Bug Report

thread 'main' panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sharded-slab-0.1.7/src/shard.rs:295:9: Thread count overflowed the configured max count. Thread index = 18446744073709551615, max threads = 4096.

It is negative overflow: 2^64 - 1

16: 0x55837d5c11b6 - sharded_slab::shard::Array<T,C>::current::h9c8364ca0319bb82 at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sharded-slab-0.1.7/src/shard.rs:295:9 17: 0x55837d5c11b6 - sharded_slab::pool::Pool<T,C>::create::ha181449469e8b97d at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sharded-slab-0.1.7/src/pool.rs:452:28 18: 0x55837d5c11b6 - sharded_slab::pool::Pool<T,C>::create_with::hacbe4cdef22cb63d at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sharded-slab-0.1.7/src/pool.rs:656:25 19: 0x55837d5bb4f4 - ::new_span::h9f9bc1117a6dba8e at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/registry/sharded.rs:246:18 20: 0x55837d5c07d4 - <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::new_span::h2477b245da36d662 at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/layer/layered.rs:126:18 21: 0x55837d5c07d4 - <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::new_span::h02f87f11e8eaf417 at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/layer/layered.rs:126:18 22: 0x55837d5c07d4 - <tracing_subscriber::fmt::Subscriber<N,E,F,W> as tracing_core::subscriber::Subscriber>::new_span::h07a61e6169961b07 at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/fmt/mod.rs:388:9 23: 0x55837e38dad0 - tracing_core::dispatcher::get_default::h7e92e67fdcb34f8e 24: 0x55837e38e3fe - tracing::span::Span::new::h894772b3a271d44d at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/span.rs:437:9 25: 0x55837ddbcfbe - polling::epoll::Poller::delete::h8707dd73e8b65399 at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polling-3.3.2/src/epoll.rs:136:20 26: 0x55837db3117d - polling::Poller::delete::h013e9ec36d8a9e1c at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polling-3.3.2/src/lib.rs:607:9

Version

│ │ │ │ │ │ ├── tracing v0.1.40 │ │ │ │ │ │ │ ├── tracing-attributes v0.1.27 (proc-macro) │ │ │ │ │ │ │ └── tracing-core v0.1.32 │ │ ├── tracing v0.1.40 () │ │ │ │ └── tracing v0.1.40 () │ ├── tracing v0.1.40 () │ ├── tracing-unwrap v0.10.0 │ │ └── tracing v0.1.40 () ├── tracing v0.1.40 () ├── tracing-subscriber v0.3.18 │ ├── tracing v0.1.40 () │ ├── tracing-core v0.1.32 () │ └── tracing-log v0.2.0 │ └── tracing-core v0.1.32 () │ ├── tracing v0.1.40 () │ ├── tracing-subscriber v0.3.18 ()

Platform

Linux 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Crates

tracing_subscriber

Description

When closing (egui/eframe) window the following error is thrown. Doesn't happen when tracing_subscriber is disabled.

ramyak-mehra commented 4 months ago

I also faced this issue. Did u find a solution for this? Unfortunately I don't have stack trace to help the maintainers

hawkw commented 4 months ago

Your program appears to be spawning a very large number of threads at the same time. Can you confirm that there are 4096 threads concurrently active in your program?

ramyak-mehra commented 4 months ago

While I don't have an exact number. Bt yes certainly there are places where such behavior is expected.
I can try to limit the number of concurrent thread spawns but is there a way to get around this. Also one thing to note they are tokio tasks and not std::thread. I don't think so tokio tasks should cause this.

The only explanation I could think of is there is some library spawning them internally. Hard to say because AFAIK I am using async version of libraries where parallelism/concurrency is a requirement

ramyak-mehra commented 4 months ago

What are the implications of changing the default 4096? I saw in sharded_slab library that there are performance implications. Is there a way to measure these benchmark? I saw the bench.rs in sharded_slab but that doesn't spawn a lot of threads. I tried modifying the benchmark to spin that many threads but failed to do so.

PPakalns commented 4 months ago

Hi, i don't think that there is any issue with upper limit 4096. EDIT: I have ~10-30 threads in my use case. It was negative overflow and was detected in debug mode.

It happens when program is being closed (aka destructors are called and threads are terminated).

Probably some thread is counted twice when terminating or some thread is not counted when it has been started.

In close future will try to upgrade my project to newest library versions again and will see if the issue still happens.

zkrising commented 3 months ago

Hey, I'm also having this issue. Did you manage to fix this @PPakalns?

PPakalns commented 3 months ago

Hey, I'm also having this issue. Did you manage to fix this @PPakalns?

Hi, updated library versions of egui, wgpu, bevy again in my project.

The same tracing crate versions, sharded slab versions are used. Problem doesn't happen now at least for me. I can not reproduce this problem anymore.

zkrising commented 3 months ago

Heya, wonderfully I've managed to replicate this bug reliably. See https://github.com/zkldi/tracing_bug

The short of it is that the bug occurs with those dependencies if you are using a custom implementation of tracing::Subscriber. I don't know why or how. This manifests inside sharded-slab for some reason.

It occurs in my app because we use tracing-tracy. For some reason, things like tracing_subscriber::fmt::layer() don't cause any issues on close.

gerwin3 commented 3 months ago

I think it's the same as this bug I filed with thread_local-rs here: https://github.com/Amanieu/thread_local-rs/issues/74 -- I replicated it too here: https://github.com/gerwin3/thread-local-accesserror-repro/blob/main/src/main.rs

Our repros looks awfully similar ;)