m-j-w / CpuId.jl

Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more.
Other
53 stars 10 forks source link

CpuId-based sched_getcpu pendant for macOS #46

Open carstenbauer opened 3 years ago

carstenbauer commented 3 years ago

On linux one can use sched_getcpu to query the id of the cpu a thread is running on:

@ccall sched_getcpu()::Cint

and

using ThreadPools
cpuid(i::Integer) = fetch(@tspawnat i @ccall sched_getcpu()::Cint)

I want to do the same on macOS where sched_getcpu isn't available. Looking for an alternative I stumbled across https://stackoverflow.com/questions/33745364/sched-getcpu-equivalent-for-os-x which mentions an alternative based on "cpuid":

#include <cpuid.h>

#define CPUID(INFO, LEAF, SUBLEAF) __cpuid_count(LEAF, SUBLEAF, INFO[0], INFO[1], INFO[2], INFO[3])

#define GETCPU(CPU) {                              \
        uint32_t CPUInfo[4];                           \
        CPUID(CPUInfo, 1, 0);                          \
        /* CPUInfo[1] is EBX, bits 24-31 are APIC ID */ \
        if ( (CPUInfo[3] & (1 << 9)) == 0) {           \
          CPU = -1;  /* no APIC on chip */             \
        }                                              \
        else {                                         \
          CPU = (unsigned)CPUInfo[1] >> 24;                    \
        }                                              \
        if (CPU < 0) CPU = 0;                          \
      }

Unfortunately, both my C and "cpuid" knowledge are limited which is why I can't translate this to Julia. Can someone help me out here? Personally, I think it would be a great addition to this package. Being able to ask on which cpu a thread is running across different OSs would be very useful in some cases.

Any help is very much appreciated. (And forgive me if I'm asking for too much here.)

chriselrod commented 3 years ago
using CpuId
function coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end
carstenbauer commented 3 years ago

Great, thanks for the translation!

However, I'm not sure it works correctly (but it may well be that the original C code isn't working anymore either):

Script:

# threads_cpuids.jl
using CpuId
function cpuid_coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end

glibc_coreid() = @ccall sched_getcpu()::Cint

using ThreadPools
using Base.Threads: nthreads

tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());

for i in 1:nthreads()
    println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)))")
    # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end

Output on a linux machine (where I know/checked that glibc_coreid is correct):

$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18)
m-j-w commented 3 years ago

You can also try cpucycle_id, which returns a tuple. You would be interested in the second element, I believe, which is also the APIC id.

"""
    cpucycle_id()
Read the CPU's [Time Stamp Counter, TSC](https://en.wikipedia.org/wiki/Time_Stamp_Counter),
and executing CPU id directly with a `rdtscp` instruction.  This function is
similar to the `cpucycle()`, but uses an instruction that also allows to
detect if the code has been moved to a different executing CPU.  See also the
comments for `cpucycle()` which equally apply.
"""
function cpucycle_id end
@eval cpucycle_id() = $(cpufeature(RDTSCP)) ? rdtscp() : (zero(UInt64),zero(UInt64))
carstenbauer commented 3 years ago

Doesn't seem to work either:

# threads_cpuids.jl
using CpuId
function cpuid_coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end

glibc_coreid() = @ccall sched_getcpu()::Cint

cpucycle_coreid() = Int(cpucycle_id()[2])

using ThreadPools
using Base.Threads: nthreads

tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());
tcpucycle_coreid(i::Integer) = fetch(@tspawnat i cpucycle_coreid());

for i in 1:nthreads()
println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
    # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end
$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2, cpucycle_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4, cpucycle_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34, cpucycle_coreid: 4099)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6, cpucycle_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36, cpucycle_coreid: 4101)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8, cpucycle_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38, cpucycle_coreid: 4103)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16, cpucycle_coreid: 10)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40, cpucycle_coreid: 4105)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18, cpucycle_coreid: 12)
chriselrod commented 3 years ago

cpucycle_coreid works for me:

julia> for i in 1:nthreads()
       println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
           # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
       end
Running on thread 1 (glibc_coreid: 0, cpuid_coreid: 0, cpucycle_coreid: 0)
Running on thread 2 (glibc_coreid: 1, cpuid_coreid: 2, cpucycle_coreid: 1)
Running on thread 3 (glibc_coreid: 2, cpuid_coreid: 4, cpucycle_coreid: 2)
Running on thread 4 (glibc_coreid: 3, cpuid_coreid: 6, cpucycle_coreid: 3)
Running on thread 5 (glibc_coreid: 4, cpuid_coreid: 8, cpucycle_coreid: 4)
Running on thread 6 (glibc_coreid: 5, cpuid_coreid: 16, cpucycle_coreid: 5)
Running on thread 7 (glibc_coreid: 6, cpuid_coreid: 18, cpucycle_coreid: 6)
Running on thread 8 (glibc_coreid: 7, cpuid_coreid: 20, cpucycle_coreid: 7)
Running on thread 9 (glibc_coreid: 8, cpuid_coreid: 22, cpucycle_coreid: 8)
Running on thread 10 (glibc_coreid: 9, cpuid_coreid: 24, cpucycle_coreid: 9)
Running on thread 11 (glibc_coreid: 10, cpuid_coreid: 1, cpucycle_coreid: 10)
Running on thread 12 (glibc_coreid: 11, cpuid_coreid: 3, cpucycle_coreid: 11)
Running on thread 13 (glibc_coreid: 12, cpuid_coreid: 5, cpucycle_coreid: 12)
Running on thread 14 (glibc_coreid: 13, cpuid_coreid: 7, cpucycle_coreid: 13)
Running on thread 15 (glibc_coreid: 14, cpuid_coreid: 9, cpucycle_coreid: 14)
Running on thread 16 (glibc_coreid: 15, cpuid_coreid: 17, cpucycle_coreid: 15)
Running on thread 17 (glibc_coreid: 16, cpuid_coreid: 19, cpucycle_coreid: 16)
Running on thread 18 (glibc_coreid: 17, cpuid_coreid: 21, cpucycle_coreid: 17)
Running on thread 19 (glibc_coreid: 18, cpuid_coreid: 23, cpucycle_coreid: 18)
Running on thread 20 (glibc_coreid: 19, cpuid_coreid: 25, cpucycle_coreid: 19)

If you mask off the result:

cpucycle_coreid() & 0x00000fff

Then all those results will match glibc_coreid

(although cpuid_coreid will still be wrong, it seems like cpucylce_coreid should work)

May be worth checking for more architectures whether 0x00000fff is really an appropriate mask. Could calculate a mask based on the number of cores:

julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system
0x0000001f

This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.

Would probably be better to look at the CpuId instruction to figure out what mask to apply.

m-j-w commented 3 years ago

Check e.g. this AMD specification, page 27… https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.amd.com/system/files/TechDocs/25481.pdf&ved=2ahUKEwjBurKJzNfvAhXC_rsIHbmsBTYQFjACegQICxAC&usg=AOvVaw1LS_9zk2Z-zenxvhUSlD20

CPUID Fn8000_0008_ECX APIC ID Size and Core Count

The width of the APIC ID is variable across architectures. The above page should give the bit width. However, there is also a legacy method mentioned.

JBlaschke commented 1 year ago

If you mask off the result:

cpucycle_coreid() & 0x00000fff

Then all those results will match glibc_coreid

(although cpuid_coreid will still be wrong, it seems like cpucylce_coreid should work)

May be worth checking for more architectures whether 0x00000fff is really an appropriate mask. Could calculate a mask based on the number of cores:

julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system
0x0000001f

This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.

Would probably be better to look at the CpuId instruction to figure out what mask to apply.

For the record, I find that the following masks:

But ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) doesn't work. I don't know why yet though.

JBlaschke commented 1 year ago

Quick update: When I wrote my last comment (above), I was running this on Perlmutter's login nodes (AMD Milan). On my intel laptop, the mask 0x00000fff doesn't work -- but ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32 still works.