steinwurf / cpuid

C++ library for detecting CPU capabilities
https://cpuid.steinwurf.com
Other
105 stars 21 forks source link

logical cores for x86 #9

Closed nestorjhernandezm closed 10 years ago

nestorjhernandezm commented 10 years ago

Hi @mortenvp , @jpihl and @petya2164

Here is the number of logical cpus for x86. It was available on ebx when cpuid() input is eax = 1. However, for the physical cpu count, cpuid() must be fed with different eax values according to the vendor ID (e.g. Intel or AMD). So I will split this functionality a little bit in order to not convolute the constructor. what do you think guys?

jpihl commented 10 years ago

Cool! Looks like a good start.. :+1:

mortenvp commented 10 years ago

Yes, good start :) Not convoluting thinks is good - so if you feel it becomes "cleaner" by splitting then do that.

nestorjhernandezm commented 10 years ago

Great! Will implement these minor changes and keep pushing! :smile:

petya2164 commented 10 years ago

Great work, Néstor! I think the problem is covered well in the SO question that you referenced. The code in the 1st answer is a bit ugly, but I am sure that we can "extract" a clean-looking solution from it.

nestorjhernandezm commented 10 years ago

Hi guys, @mortenvp @jpihl and @petya2164

Now we have both counts :+1: I still have to correct some minor details and extend the implementation for other compilers and ARM. But it works for gcc, clang and llvm... take a look...

petya2164 commented 10 years ago

@nestorjhernandezm Interesting idea with the total separation of functions ;) I think that it is a bit too much to have only one function per file. I would put all these cpuid_*_calls into one header file. Then we would have some coherence in the implementation. I feel that it is getting too fragmented ;)

mortenvp commented 10 years ago

@nestorjhernandezm Yes, I don't like the naming here.. A name like this is not very descriptive cpuid_0_calls instead what does it do?

Maybe you can make one which is called:

void invoke_x86_cpuid(uint32_t& eax = 0, uint32_t& ebx = 0, uint32_t& ecx = 0, uint32_t& edx = 0)
{

        __asm__("cpuid"
                : "=a"(eax), "=b"(ebx),
                  "=c"(ecx), "=d"(edx)
                : "a"(eax), "b"(ebx), "c"(ecx), "d"(edx) );
}

I haven't checked whether the above would have some problems but I think it should be ok. But otherwise I would just put it in the init_linux_gcc_x86.hpp I'm not sure whether splitting the asm call into more functions makes it more readable.

nestorjhernandezm commented 10 years ago

Yes guys, actually I was thinking the same when I was designing them. Also, I thought what if we need more of these flags in the near future? I guess that we will address that then and it seems that it is not the most useful approach to split them now. I agree and like to put void init_cpuinfo(cpuinfo::impl& info) internal functions in the same header. Would not be better something like:

init_linux_gcc_x86.hpp:

void init_cpuinfo(cpuinfo::impl& info)
{
    extract_x86_flags_and_logic_cores(info);
    extract_x86_physical_cores(info);
}

void extract_x86_flags_and_logic_cores(cpuinfo::impl& info) // Q: is this name too long?
{
    uint32_t eax; 
    /* and the other registers*/
    invoke_cpuid(1); 
    get_x86_flags(info,ecx,edx); // get bool has_*() const
    get_x86_logic_cores(info,ebx); // get uint32_t logical_cores() const; Q: should this be splitted?
}

void extract_x86_physical_cores(cpuinfo::impl& info)
{
    uint32_t eax; 
    /* and the other registers*/
    invoke_cpuid();
    get_vendor_id(info,eax,ebx,ecx,edx);

    if (info.m_vendor_id == "Intel")
    {
         // calculate with invoke_cpuid(4);
     }
     else if (info.m_vendor_id == "AuthenticAMD")
     {
         // calcultate with invoke_cpuid(0x80000008);
     }
}

// define invoke_cpuid
// define get_x86_flags
// define get_x86_logic_cores

<EOF>

...?

petya2164 commented 10 years ago

@nestorjhernandezm I like this approach, just because I can immediately understand what is happening by looking at one file ;) The function "extract_x86_flags_and_logic_cores" does not make things better, we don't have to split everything into functions. The same is true for "extract_x86_physical_cores". I would keep it simple.

nestorjhernandezm commented 10 years ago

@petya2164 Ok, then it will look something like:

uint32_t eax; 
/* and the other registers*/
invoke_cpuid(1); 
get_x86_flags(info,ecx,edx); // get bool has_*() const
get_x86_logic_cores(info,ebx); // get uint32_t logical_cores() const

invoke_cpuid();
get_vendor_id(info,eax,ebx,ecx,edx);

if (info.m_vendor_id == "Intel")
{
    // calculate with invoke_cpuid(4);
}
else if (info.m_vendor_id == "AuthenticAMD")
{
    // calcultate with invoke_cpuid(0x80000008);
}

// define invoke_cpuid(){}
// define get_x86_flags(){}
// define get_x86_logic_cores(){}
// define get_vendor_id and the other buddies(){}

The only thing that worries me is not keeping isolated invoke_cpuid() ... I guess that if used properly it should be fine. Do you guys?

mortenvp commented 10 years ago

The reason we have the init_linux_gcc_x86.hpp function is that gcc has a specific way of invoking the inline assember (differnt than e.g. msvc) and that x86 is different than arm (which does not have the cpuid function). Now after extracting the valid registers you dispatch the register values to the extract_x86_flags which we could rename to extract_x86_info and then have extract_x86_cpu_flags(...), extract_x86_vendor_string(...) and extract_x86_logical_cores(...) these functions should specify which registers they need to extract the information. The job of the init_cpuinfo function is then to call the extract functions with the proper registers. Does that make sense?

mortenvp commented 10 years ago

I would drop the seperate function for invoke_cpuid I think and just call the inline assembler in the init_cpuinfo

nestorjhernandezm commented 10 years ago

Yes, I didn't specify them in my previous example in order to not make very large the fence code, but surely they will if that's the case. Good, not too modular not too monolithic. Will make some changes then

nestorjhernandezm commented 10 years ago

@mortenvp @petya2164 @jpihl You may want to take a look whenever you are available. I included all the functions in a single file in a more simpler way in init_linux_gcc_x86.hpp. The call for the invoke_cpuid was kinda tricky, but now it does the work :wink:

nestorjhernandezm commented 10 years ago

@mortenvp @petya2164 @jpihl Also included the changes for msvc compiler. If you are ok with it, we can merge this one. The only remaining part will be to replicate this for ARM. Let me know...

nestorjhernandezm commented 10 years ago

Hi guys, I also included the count for ARM on init_linux_gcc_arm.hpp, but I'm not quite if it works. The .log files are reporting single cores (from the initialization of the class I guess). Take a look here: http://buildbot.steinwurf.dk/builders/cpuid-android44-cxx_android_gxx46_arm-py27/builds/91/steps/test/logs/stdio and give me your opinion. At night will be reviewing what does sysconf() and std::thread::hardware_concurrency() return

mortenvp commented 10 years ago

Do we have unit tests for the physical and logical core detection? Otherwise we cannot merge.. I will take a look at the code later today.

petya2164 commented 10 years ago

@nestorjhernandezm I looked at the code, and I like the changes! This is definitely a good step towards clarity. I will add some minor inline comments about coding style. The ARM core counts are not correct now, our Kindle is dual-core and the Nexus 4 should be quad-core. As Morten says, we also need to check the core counts in the unit tests. So we need to add parameters like "--has_logical_cores=8" and "--has_physical_cores=4" to verify the results. We can easily do that when we have the right counts for all platforms.

nestorjhernandezm commented 10 years ago

@petya2164 Thanks Péter! I already made all this corrections.

@mortenvp Yes, you are right. Actually this was going to be my next step :blush: I also think that new test parameters should be set up at the buildbot

nestorjhernandezm commented 10 years ago

I tried to check if the number is available by using boost::thread::hardware_concurrency(); but unfortunately I'm still the constructor's initialization. For the physical number of cores it seems better to read /sys/devices/system/cpu/present. Also it seems possible to read /proc/cpuinfo. I'll keep an eye on this

petya2164 commented 10 years ago

@nestorjhernandezm I will add the new parameters on the buildbot, and also in the program options here. Then we will get red builds for the incorrect counts.

petya2164 commented 10 years ago

So I found that none of the detected core counts are accurate on any of our machines :( Most of our build machines feature these 2 CPUs: http://ark.intel.com/products/52214/Intel-Core-i7-2600K-Processor-8M-Cache-up-to-3_80-GHz http://ark.intel.com/products/65524/Intel-Core-i7-3770S-Processor-8M-Cache-up-to-3_90-GHz Cores: 4 Threads: 8 We should detect 4 physical and 8 logical cores.

nestorjhernandezm commented 10 years ago

Yes, this should be modified to. Also the unit test should be splitted in let's say:

  1. Test instructions sets
  2. Test cores counts

Maybe we can include the platform too. I will take a look at machines directly to see what do they return.

nestorjhernandezm commented 10 years ago

@mortenvp @jpihl @petya2164 corrections made here. Take a look at this and the buildbot pull request that I left a few minutes ago. I think I'll also remove the cout prints that I made too. If something is missing or should be corrected, let me know

petya2164 commented 10 years ago

@nestorjhernandezm I just fixed a segfault on 32-bit Linux. Apparently, it is not safe to only capture 2 registers when we use cpuid with a 32-bit compiler. It is all green now.

nestorjhernandezm commented 10 years ago

Great Péter, actually I wanted to check this on the buildslaves because I saw something similar locally in my laptop. I thought it will be good to remove unnecesary variables but definitively it is better if we replicated the issue on the buildslaves in order to correct this. Good, so are we missing something extra here?