Open carstenbauer opened 3 years ago
using CpuId
function coreid()
eax, ebx, ecx, edx = CpuId.cpuid(1, 0)
if ( (edx & (0x00000001 << 9)) == 0x00000000)
CPU = -1; # no APIC on chip
else
CPU = (ebx%Int) >> 24;
end
CPU < 0 ? 0 : CPU
end
Great, thanks for the translation!
However, I'm not sure it works correctly (but it may well be that the original C code isn't working anymore either):
Script:
# threads_cpuids.jl
using CpuId
function cpuid_coreid()
eax, ebx, ecx, edx = CpuId.cpuid(1, 0)
if ( (edx & (0x00000001 << 9)) == 0x00000000)
CPU = -1; # no APIC on chip
else
CPU = (ebx%Int) >> 24;
end
CPU < 0 ? 0 : CPU
end
glibc_coreid() = @ccall sched_getcpu()::Cint
using ThreadPools
using Base.Threads: nthreads
tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());
for i in 1:nthreads()
println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)))")
# @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end
Output on a linux machine (where I know/checked that glibc_coreid
is correct):
$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18)
You can also try cpucycle_id
, which returns a tuple. You would be interested in the second element, I believe, which is also the APIC id.
"""
cpucycle_id()
Read the CPU's [Time Stamp Counter, TSC](https://en.wikipedia.org/wiki/Time_Stamp_Counter),
and executing CPU id directly with a `rdtscp` instruction. This function is
similar to the `cpucycle()`, but uses an instruction that also allows to
detect if the code has been moved to a different executing CPU. See also the
comments for `cpucycle()` which equally apply.
"""
function cpucycle_id end
@eval cpucycle_id() = $(cpufeature(RDTSCP)) ? rdtscp() : (zero(UInt64),zero(UInt64))
Doesn't seem to work either:
# threads_cpuids.jl
using CpuId
function cpuid_coreid()
eax, ebx, ecx, edx = CpuId.cpuid(1, 0)
if ( (edx & (0x00000001 << 9)) == 0x00000000)
CPU = -1; # no APIC on chip
else
CPU = (ebx%Int) >> 24;
end
CPU < 0 ? 0 : CPU
end
glibc_coreid() = @ccall sched_getcpu()::Cint
cpucycle_coreid() = Int(cpucycle_id()[2])
using ThreadPools
using Base.Threads: nthreads
tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());
tcpucycle_coreid(i::Integer) = fetch(@tspawnat i cpucycle_coreid());
for i in 1:nthreads()
println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
# @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end
$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2, cpucycle_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4, cpucycle_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34, cpucycle_coreid: 4099)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6, cpucycle_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36, cpucycle_coreid: 4101)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8, cpucycle_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38, cpucycle_coreid: 4103)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16, cpucycle_coreid: 10)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40, cpucycle_coreid: 4105)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18, cpucycle_coreid: 12)
cpucycle_coreid
works for me:
julia> for i in 1:nthreads()
println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
# @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end
Running on thread 1 (glibc_coreid: 0, cpuid_coreid: 0, cpucycle_coreid: 0)
Running on thread 2 (glibc_coreid: 1, cpuid_coreid: 2, cpucycle_coreid: 1)
Running on thread 3 (glibc_coreid: 2, cpuid_coreid: 4, cpucycle_coreid: 2)
Running on thread 4 (glibc_coreid: 3, cpuid_coreid: 6, cpucycle_coreid: 3)
Running on thread 5 (glibc_coreid: 4, cpuid_coreid: 8, cpucycle_coreid: 4)
Running on thread 6 (glibc_coreid: 5, cpuid_coreid: 16, cpucycle_coreid: 5)
Running on thread 7 (glibc_coreid: 6, cpuid_coreid: 18, cpucycle_coreid: 6)
Running on thread 8 (glibc_coreid: 7, cpuid_coreid: 20, cpucycle_coreid: 7)
Running on thread 9 (glibc_coreid: 8, cpuid_coreid: 22, cpucycle_coreid: 8)
Running on thread 10 (glibc_coreid: 9, cpuid_coreid: 24, cpucycle_coreid: 9)
Running on thread 11 (glibc_coreid: 10, cpuid_coreid: 1, cpucycle_coreid: 10)
Running on thread 12 (glibc_coreid: 11, cpuid_coreid: 3, cpucycle_coreid: 11)
Running on thread 13 (glibc_coreid: 12, cpuid_coreid: 5, cpucycle_coreid: 12)
Running on thread 14 (glibc_coreid: 13, cpuid_coreid: 7, cpucycle_coreid: 13)
Running on thread 15 (glibc_coreid: 14, cpuid_coreid: 9, cpucycle_coreid: 14)
Running on thread 16 (glibc_coreid: 15, cpuid_coreid: 17, cpucycle_coreid: 15)
Running on thread 17 (glibc_coreid: 16, cpuid_coreid: 19, cpucycle_coreid: 16)
Running on thread 18 (glibc_coreid: 17, cpuid_coreid: 21, cpucycle_coreid: 17)
Running on thread 19 (glibc_coreid: 18, cpuid_coreid: 23, cpucycle_coreid: 18)
Running on thread 20 (glibc_coreid: 19, cpuid_coreid: 25, cpucycle_coreid: 19)
If you mask off the result:
cpucycle_coreid() & 0x00000fff
Then all those results will match glibc_coreid
(although cpuid_coreid
will still be wrong, it seems like cpucylce_coreid should work)
May be worth checking for more architectures whether 0x00000fff
is really an appropriate mask.
Could calculate a mask based on the number of cores:
julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system
0x0000001f
This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.
Would probably be better to look at the CpuId instruction to figure out what mask to apply.
Check e.g. this AMD specification, page 27… https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.amd.com/system/files/TechDocs/25481.pdf&ved=2ahUKEwjBurKJzNfvAhXC_rsIHbmsBTYQFjACegQICxAC&usg=AOvVaw1LS_9zk2Z-zenxvhUSlD20
CPUID Fn8000_0008_ECX APIC ID Size and Core Count
The width of the APIC ID is variable across architectures. The above page should give the bit width. However, there is also a legacy method mentioned.
If you mask off the result:
cpucycle_coreid() & 0x00000fff
Then all those results will match
glibc_coreid
(although
cpuid_coreid
will still be wrong, it seems like cpucylce_coreid should work)May be worth checking for more architectures whether
0x00000fff
is really an appropriate mask. Could calculate a mask based on the number of cores:julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system 0x0000001f
This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.
Would probably be better to look at the CpuId instruction to figure out what mask to apply.
For the record, I find that the following masks:
((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32
0x00000fff
work -- eg:
const cpucycle_mask = ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32
cpucycle_coreid() = Int(cpucycle_id()[2] & cpucycle_mask)
But ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1)
doesn't work. I don't know why yet though.
Quick update: When I wrote my last comment (above), I was running this on Perlmutter's login nodes (AMD Milan). On my intel laptop, the mask 0x00000fff
doesn't work -- but ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32
still works.
On linux one can use
sched_getcpu
to query the id of the cpu a thread is running on:and
I want to do the same on macOS where
sched_getcpu
isn't available. Looking for an alternative I stumbled across https://stackoverflow.com/questions/33745364/sched-getcpu-equivalent-for-os-x which mentions an alternative based on "cpuid":Unfortunately, both my C and "cpuid" knowledge are limited which is why I can't translate this to Julia. Can someone help me out here? Personally, I think it would be a great addition to this package. Being able to ask on which cpu a thread is running across different OSs would be very useful in some cases.
Any help is very much appreciated. (And forgive me if I'm asking for too much here.)