m-j-w / CpuId.jl

Ask the CPU for cache sizes, SIMD feature support, a running hypervisor, and more.
Other
54 stars 10 forks source link

Add comparison to Hwloc.jl to the README? #7

Closed juliohm closed 7 years ago

juliohm commented 7 years ago

Can you please explain the advantages/disadvantages of this package compared to Hwloc.jl? What is the overlap, maybe it should be written in the README?

m-j-w commented 7 years ago

Good point, I'll add a direct comparison to the README.

Production-ready alternatives: The Julia package Hwloc.jl provides similar and more information regarding the topology of your CPUs, viz. number of packages, physical & logical cores and associated caches. However, it also pulls in additional external binary dependencies in that it relies on hwloc, which also implies quite some computational overhead. Whether this is an issue in the first place depends much on your use-case.

CpuId takes a different approach in that it talks directly to the CPU. For instance, asking the CPU for its number of cores or whether it supports AVX2 can be achieved in probably 250..500 CPU cycles, thanks to Julia's JIT-compilation approach and inlining. For comparison, 100..200 CPU cycles is roughly loading one integer from main memory or two integer divisions. Calling any external library function is at least one order more cycles. This allows moving such feature checks much closer or even directly in a hot zone (which, however, might also hint towards a questionable coding pattern). Also, CpuId gives additional feature checks, such as whether your executing on a virtual machine, which again may or may not influence how you set up your high performance computing tasks in a more general way. Finally, the cpuid(...) function exposes this low-level interface to the users, enabling them to make equally fast and reliable run-time feature checks on new or other hardware.

Would something like this answer your question?

m-j-w commented 7 years ago

And just to make my case regarding computational speed, here's some results from my current experimental code (not yet pushed):

Measured benchmarking overhead using `rdtsc` instruction:
    25 CPU cycles per measurement.
Measuring average of 1000000 calls to `cachesize(n)` for each of n = 0:8...
Taking measured overhead into account.

            Cost of calling `cachesize(N)`
       ┌────────────────────────────────────────┐ 
     0 │ 1                                      │ 
     1 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 324                │ 
     2 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 560 │ 
     3 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 560 │ 
   N 4 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 513    │ 
     5 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 274                   │ 
     6 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 279                   │ 
     7 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 276                   │ 
     8 │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 274                   │ 
       └────────────────────────────────────────┘ 
         CPU cycles per invokation for given N

Measuring average of 1000000 calls to `cachesize()`...
Taking measured overhead into account.

            Cost of calling `cachesize()`
      ┌────────────────────────────────────────┐ 
   () │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 4752 │ 
      └────────────────────────────────────────┘ 
              CPU cycles per invokation

Measuring average of 1000000 calls to `hypervised()`...
Taking measured overhead into account.

           Cost of calling `hypervised()`
      ┌────────────────────────────────────────┐ 
   () │▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪ 208 │ 
      └────────────────────────────────────────┘ 
              CPU cycles per invokation
juliohm commented 7 years ago

That is great @m-j-w , the explanation you gave is very clear, and will definitely improve the README.

Feel free to close the issue anytime.

m-j-w commented 7 years ago

fixed by 8637713c6f9eeee356a31b418bcb1bd77fb8f6cc