Closed nestorjhernandezm closed 10 years ago
Cool! Looks like a good start.. :+1:
Yes, good start :) Not convoluting thinks is good - so if you feel it becomes "cleaner" by splitting then do that.
Great! Will implement these minor changes and keep pushing! :smile:
Great work, Néstor! I think the problem is covered well in the SO question that you referenced. The code in the 1st answer is a bit ugly, but I am sure that we can "extract" a clean-looking solution from it.
Hi guys, @mortenvp @jpihl and @petya2164
Now we have both counts :+1: I still have to correct some minor details and extend the implementation for other compilers and ARM. But it works for gcc
, clang
and llvm
... take a look...
@nestorjhernandezm Interesting idea with the total separation of functions ;) I think that it is a bit too much to have only one function per file. I would put all these cpuid_*_calls into one header file. Then we would have some coherence in the implementation. I feel that it is getting too fragmented ;)
@nestorjhernandezm Yes, I don't like the naming here.. A name like this is not very descriptive cpuid_0_calls
instead what does it do?
Maybe you can make one which is called:
void invoke_x86_cpuid(uint32_t& eax = 0, uint32_t& ebx = 0, uint32_t& ecx = 0, uint32_t& edx = 0)
{
__asm__("cpuid"
: "=a"(eax), "=b"(ebx),
"=c"(ecx), "=d"(edx)
: "a"(eax), "b"(ebx), "c"(ecx), "d"(edx) );
}
I haven't checked whether the above would have some problems but I think it should be ok.
But otherwise I would just put it in the init_linux_gcc_x86.hpp
I'm not sure whether splitting the asm call into more functions makes it more readable.
Yes guys, actually I was thinking the same when I was designing them. Also, I thought what if we need more of these flags in the near future? I guess that we will address that then and it seems that it is not the most useful approach to split them now. I agree and like to put void init_cpuinfo(cpuinfo::impl& info)
internal functions in the same header. Would not be better something like:
init_linux_gcc_x86.hpp:
void init_cpuinfo(cpuinfo::impl& info)
{
extract_x86_flags_and_logic_cores(info);
extract_x86_physical_cores(info);
}
void extract_x86_flags_and_logic_cores(cpuinfo::impl& info) // Q: is this name too long?
{
uint32_t eax;
/* and the other registers*/
invoke_cpuid(1);
get_x86_flags(info,ecx,edx); // get bool has_*() const
get_x86_logic_cores(info,ebx); // get uint32_t logical_cores() const; Q: should this be splitted?
}
void extract_x86_physical_cores(cpuinfo::impl& info)
{
uint32_t eax;
/* and the other registers*/
invoke_cpuid();
get_vendor_id(info,eax,ebx,ecx,edx);
if (info.m_vendor_id == "Intel")
{
// calculate with invoke_cpuid(4);
}
else if (info.m_vendor_id == "AuthenticAMD")
{
// calcultate with invoke_cpuid(0x80000008);
}
}
// define invoke_cpuid
// define get_x86_flags
// define get_x86_logic_cores
<EOF>
...?
@nestorjhernandezm I like this approach, just because I can immediately understand what is happening by looking at one file ;) The function "extract_x86_flags_and_logic_cores" does not make things better, we don't have to split everything into functions. The same is true for "extract_x86_physical_cores". I would keep it simple.
@petya2164 Ok, then it will look something like:
uint32_t eax;
/* and the other registers*/
invoke_cpuid(1);
get_x86_flags(info,ecx,edx); // get bool has_*() const
get_x86_logic_cores(info,ebx); // get uint32_t logical_cores() const
invoke_cpuid();
get_vendor_id(info,eax,ebx,ecx,edx);
if (info.m_vendor_id == "Intel")
{
// calculate with invoke_cpuid(4);
}
else if (info.m_vendor_id == "AuthenticAMD")
{
// calcultate with invoke_cpuid(0x80000008);
}
// define invoke_cpuid(){}
// define get_x86_flags(){}
// define get_x86_logic_cores(){}
// define get_vendor_id and the other buddies(){}
The only thing that worries me is not keeping isolated invoke_cpuid()
... I guess that if used properly it should be fine. Do you guys?
The reason we have the init_linux_gcc_x86.hpp
function is that gcc has a specific way of invoking the inline assember (differnt than e.g. msvc) and that x86 is different than arm (which does not have the cpuid function). Now after extracting the valid registers you dispatch the register values to the extract_x86_flags
which we could rename to extract_x86_info
and then have extract_x86_cpu_flags(...)
, extract_x86_vendor_string(...)
and extract_x86_logical_cores(...)
these functions should specify which registers they need to extract the information. The job of the init_cpuinfo
function is then to call the extract functions with the proper registers. Does that make sense?
I would drop the seperate function for invoke_cpuid
I think and just call the inline assembler in the init_cpuinfo
Yes, I didn't specify them in my previous example in order to not make very large the fence code, but surely they will if that's the case. Good, not too modular not too monolithic. Will make some changes then
@mortenvp @petya2164 @jpihl You may want to take a look whenever you are available. I included all the functions in a single file in a more simpler way in init_linux_gcc_x86.hpp
. The call for the invoke_cpuid
was kinda tricky, but now it does the work :wink:
@mortenvp @petya2164 @jpihl Also included the changes for msvc
compiler. If you are ok with it, we can merge this one. The only remaining part will be to replicate this for ARM. Let me know...
Hi guys, I also included the count for ARM on init_linux_gcc_arm.hpp
, but I'm not quite if it works. The .log
files are reporting single cores (from the initialization of the class I guess). Take a look here: http://buildbot.steinwurf.dk/builders/cpuid-android44-cxx_android_gxx46_arm-py27/builds/91/steps/test/logs/stdio and give me your opinion. At night will be reviewing what does sysconf()
and std::thread::hardware_concurrency()
return
Do we have unit tests for the physical and logical core detection? Otherwise we cannot merge.. I will take a look at the code later today.
@nestorjhernandezm I looked at the code, and I like the changes! This is definitely a good step towards clarity. I will add some minor inline comments about coding style. The ARM core counts are not correct now, our Kindle is dual-core and the Nexus 4 should be quad-core. As Morten says, we also need to check the core counts in the unit tests. So we need to add parameters like "--has_logical_cores=8" and "--has_physical_cores=4" to verify the results. We can easily do that when we have the right counts for all platforms.
@petya2164 Thanks Péter! I already made all this corrections.
@mortenvp Yes, you are right. Actually this was going to be my next step :blush: I also think that new test parameters should be set up at the buildbot
I tried to check if the number is available by using boost::thread::hardware_concurrency();
but unfortunately I'm still the constructor's initialization. For the physical number of cores it seems better to read /sys/devices/system/cpu/present
. Also it seems possible to read /proc/cpuinfo
. I'll keep an eye on this
@nestorjhernandezm I will add the new parameters on the buildbot, and also in the program options here. Then we will get red builds for the incorrect counts.
So I found that none of the detected core counts are accurate on any of our machines :( Most of our build machines feature these 2 CPUs: http://ark.intel.com/products/52214/Intel-Core-i7-2600K-Processor-8M-Cache-up-to-3_80-GHz http://ark.intel.com/products/65524/Intel-Core-i7-3770S-Processor-8M-Cache-up-to-3_90-GHz Cores: 4 Threads: 8 We should detect 4 physical and 8 logical cores.
Yes, this should be modified to. Also the unit test should be splitted in let's say:
Maybe we can include the platform too. I will take a look at machines directly to see what do they return.
@mortenvp @jpihl @petya2164 corrections made here. Take a look at this and the buildbot pull request that I left a few minutes ago. I think I'll also remove the cout
prints that I made too. If something is missing or should be corrected, let me know
@nestorjhernandezm I just fixed a segfault on 32-bit Linux. Apparently, it is not safe to only capture 2 registers when we use cpuid with a 32-bit compiler. It is all green now.
Great Péter, actually I wanted to check this on the buildslaves because I saw something similar locally in my laptop. I thought it will be good to remove unnecesary variables but definitively it is better if we replicated the issue on the buildslaves in order to correct this. Good, so are we missing something extra here?
Hi @mortenvp , @jpihl and @petya2164
Here is the number of logical cpus for x86. It was available on
ebx
whencpuid()
input iseax = 1
. However, for the physical cpu count,cpuid()
must be fed with differenteax
values according to the vendor ID (e.g. Intel or AMD). So I will split this functionality a little bit in order to not convolute the constructor. what do you think guys?