m-j-w / CpuId.jl

Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more.
Other
54 stars 10 forks source link

`cachesize(::Int)` return zero #47

Closed cscherrer closed 2 years ago

cscherrer commented 2 years ago

Hi, we saw a problem coming up here: https://github.com/JuliaSIMD/TriangularSolve.jl/issues/20

On this machine:

julia> versioninfo()
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 32 × AMD Ryzen Threadripper 2950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver1)
  Threads: 1 on 32 virtual cores

julia> cpuinfo()
  Cpu Property       Value                                                     
  –––––––––––––––––– ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  Brand              AMD Ryzen Threadripper 2950X 16-Core Processor            
  Vendor             :AMD                                                      
  Architecture       :Zen                                                      
  Model              Family: 0x8f, Model: 0x08, Stepping: 0x02, Type: 0x00     
  Cores              32 physical cores, 32 logical cores (on executing CPU)    
                     No Hyperthreading hardware capability detected            
  Clock Frequencies  Not supported by CPU                                      
  Data Cache         Level 1:3 : (32, 512, 8192) kbytes                        
                     64 byte cache line size                                   
  Address Size       48 bits virtual, 48 bits physical                         
  SIMD               256 bit = 32 byte max. SIMD vector size                   
  Time Stamp Counter TSC is accessible via `rdtsc`                             
                     TSC runs at constant rate (invariant from clock frequency)
  Perf. Monitoring   Performance Monitoring Counters (PMC) are not supported   
  Hypervisor         No                                                        

Here's the result of calling cachesize(::Int):

julia> CpuId.cachesize.(1:4)
4-element Vector{Int64}:
 0
 0
 0
 0

OTOH cachesize() seems to work fine:

julia> cachesize()
(32768, 524288, 8388608)

cc @chriselrod

chriselrod commented 2 years ago

Hmm. As a workaround, I can call cachesize() instead.

m-j-w commented 2 years ago

Interesting. I'll take a look tomorrow.

m-j-w commented 2 years ago

Diagnosis is trivial: only cachesize() has the AMD cpu call implemented. Hence the zero with from the cachesize(lvl). Should be simple to fix tomorrow.

cscherrer commented 2 years ago

Great! Thanks for the quick response 😄

m-j-w commented 2 years ago

I've added a possible solution. Would you please try with #49 ? Many thanks !

cscherrer commented 2 years ago
julia> CpuId.cachesize.(1:4)
4-element Vector{Int64}:
   32768
  524288
 8388608
       0

Just to be sure, compare

chad@boondoggle ~> inxi -Cxxx
CPU:
  Info: 8-core model: AMD Ryzen Threadripper 2950X bits: 64 type: MT MCP MCM
    smt: enabled arch: Zen+ rev: 2 cache: L1: 1.5 MiB L2: 8 MiB L3: 32 MiB
  Speed (MHz): avg: 1837 high: 2485 min/max: 2200/3500 boost: enabled
    cores: 1: 2485 2: 2434 3: 1716 4: 1717 5: 1718 6: 1766 7: 1936 8: 1830
    9: 1897 10: 1886 11: 1892 12: 1884 13: 2001 14: 1906 15: 1886 16: 1876
    17: 1890 18: 1886 19: 1887 20: 1886 21: 1887 22: 1886 23: 1886 24: 1884
    25: 1718 26: 1713 27: 1716 28: 1710 29: 1825 30: 1712 31: 2484 32: 0
    bogomips: 223654
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm

To me these don't match up, but I may be missing some subtlety

chriselrod commented 2 years ago

I think it is correct. Each core has 1x 32 KiB L1d and 1x 51 KiB L2. Total L1 and L2 cache are thus 16*32 KiB and 16*512 KiB, but people normally report the per-core numbers.

The CPU itself has 4x 8 MiB L3, for 32 MiB L3 total. For L3, I often see marketing report total, but they're also separate caches, shared by complexes of 4 cores on the 2950X.

So these numbers look correct / are consistent with the pattern of reporting the sizes of the individual caches.

m-j-w commented 2 years ago

New release tomorrow.

m-j-w commented 2 years ago

@chriselrod @cscherrer The cpuid function can also report whether a cache level is shared, and if so, between how many cores. This would also answer exactly the point mentioned by the both of you. (see cpuid leaf 8000_001d for AMD, bits 25:14.

In this case, you could calculate the "cache fair share per core".

If this is helpful, then please open an issue and I'll add the function.

cscherrer commented 2 years ago

Thanks @m-j-w . @chriselrod has a deeper understanding of how this is used to help packages like LoopVectorization.jl, so I'll leave it to him to judge whether there are downstream advantages to having this available