jhpratt / num_threads

Obtain the number of threads in the current process
https://docs.rs/num_threads
Apache License 2.0
12 stars 7 forks source link

Incorrect no. of threads in `x86_64-apple-darwin` targets run on Apple chips (M3 at least) #18

Open TicClick opened 9 months ago

TicClick commented 9 months ago
fn main() {
    assert_eq!(
        num_threads::num_threads(),
        std::num::NonZeroUsize::new(1)
    )
}
cargo run --target aarch64-apple-darwin               
   Compiling test-threadcount v1.0.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 0.51s
     Running `target/aarch64-apple-darwin/debug/test-threadcount`
cargo run --target x86_64-apple-darwin                
   Compiling test-threadcount v1.0.0 (...)
    Finished dev [unoptimized + debuginfo] target(s) in 0.19s
     Running `target/x86_64-apple-darwin/debug/test-threadcount`
thread 'main' panicked at src/main.rs:2:5:
assertion `left == right` failed
  left: Some(2)
 right: Some(1)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

LLDB says it's 1 in both cases, which appears correct to me:

lldb target/x86_64-apple-darwin/debug/test-threadcount
(lldb) target create "target/x86_64-apple-darwin/debug/test-threadcount"
Current executable set to '.../target/x86_64-apple-darwin/debug/test-threadcount' (x86_64).
(lldb) break set --file src/main.rs --line 3
Breakpoint 1: where = test-threadcount`test_threadcount::main::h0b4a5b9d24100f68 + 11 at main.rs:3:9, address = 0x00000001000019ab
(lldb) run
Process 48950 launched: '.../target/x86_64-apple-darwin/debug/test-threadcount' (x86_64)
warning: libobjc.A.dylib is being read from process memory. This indicates that LLDB could not read from the host's in-memory shared cache. This will likely reduce debugging performance.

Process 48950 stopped
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001000019ab test-threadcount`test_threadcount::main::h0b4a5b9d24100f68 at main.rs:3:9
   1    fn main() {
   2        assert_eq!(
-> 3            num_threads::num_threads(),
   4            std::num::NonZeroUsize::new(1)
   5        )
   6    }
Target 0: (test-threadcount) stopped.
(lldb) thread list
Process 48950 stopped
* thread #1: tid = 0x2dfb06, 0x00000001000019ab test-threadcount`test_threadcount::main::h0b4a5b9d24100f68 at main.rs:3:9, name = 'main', queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
(lldb) ^D
Project info ``` rustc --version rustc 1.75.0 (82e1608df 2023-12-21) cargo --version cargo 1.75.0 (1d8b05cdd 2023-11-20) ``` `Cargo.toml` ```toml [package] name = "test-threadcount" version = "1.0.0" [dependencies] num_threads = "0.1.6" ``` `Cargo.lock` ```toml # This file is automatically @generated by Cargo. # It is not intended for manual editing. version = 3 [[package]] name = "libc" version = "0.2.153" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9c198f91728a82281a64e1f4f9eeb25d82cb32a5de251c6bd1b5154d63a8e7bd" [[package]] name = "num_threads" version = "0.1.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2819ce041d2ee131036f4fc9d6ae7ae125a3a40e97ba64d04fe799ad9dabbb44" dependencies = [ "libc", ] [[package]] name = "test-threadcount" version = "1.0.0" dependencies = [ "num_threads", ] ```
jhpratt commented 9 months ago

Confirmed on an M1 chip, which is all I have access to. I'm not sure what could be causing this, as it's the same code being run in both situations.

TicClick commented 9 months ago

no idea (could be that SMT is involved, but it's unlikely), but it's pretty persistent, and I wasn't able to find a reliable indicator for discerning between "real" and "imaginary" threads: https://gist.github.com/TicClick/360cc924b6e8abfd64e3f8e70014b527

the only oddity is that one of these (imaginary?) had a significantly higher cpu time

TicClick commented 9 months ago

sampling the binary's activity using Activity Monitor yields this:

Call graph:
    2167 Thread_3938515   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2167 start  (in dyld) + 1942  [0x204a13386]
    +   2167 std::rt::lang_start::h6223c8693c27d4d8  (in test-threadcount) + 37  [0x104770045]
    +     2167 std::rt::lang_start_internal::h13f88294184c535a  (in test-threadcount) + 790  [0x10478a2e6]
    +       2167 std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h6d974bea167e02c2  (in test-threadcount) + 12  [0x10477005c]
    +         2167 std::sys_common::backtrace::__rust_begin_short_backtrace::h920234205834e928  (in test-threadcount) + 6  [0x104770016]
    +           2167 test_threadcount::main::h2b5ec1a26c21b16a  (in test-threadcount) + 16  [0x104770090]
    2167 Thread_3938562: com.apple.rosetta.exceptionserver
      2167 ???  (in runtime)  load address 0x7ff7ffc09000 + 0x4294  [0x7ff7ffc0d294]

while being a very fair result, it's still not what LLDB sees -- somehow it's able to narrow it down just to the binary's specific threads excluding the translation layer, and I think num_threads should, too (if possible!)

TicClick commented 9 months ago

https://discourse.llvm.org/t/76874 well LLDB uses a separate debugserver, and that's where it ends -- either find a rosetta-thread-defining feature, or return num_threads() - 1 when the executable is being emulated (which I think isn't a viable strategy, since God knows how many them sidecar threads it can have)