steinwurf / cpuid

C++ library for detecting CPU capabilities
https://cpuid.steinwurf.com
Other
105 stars 21 forks source link

Avx2 detection #16

Closed nestorjhernandezm closed 10 years ago

nestorjhernandezm commented 10 years ago

Hi @mortenvp , @petya2164 and @jpihl

These should be the changes on the CPU ID project side for AVX2 detection. Will add the ones regarding the buildbot side to do the remaining. Take a look and let me know when we can check out the changes in the build system :+1:

mortenvp commented 10 years ago

Hi @nestorjhernandezm looks good to me. The only thing I would like to have is in the comments a reference to the place where you information about where in the registers the different features are stored.

nestorjhernandezm commented 10 years ago

@mortenvp merge or shall we wait for the changes in the buildbot in order to not generate false alarms?

mortenvp commented 10 years ago

@nestorjhernandezm wait for the changes

nestorjhernandezm commented 10 years ago

:+1:

petya2164 commented 10 years ago

I tested the detection on the buildbot and we have some discrepancies:

true detected for AVX2 instead of the false (debian2 and debian4 do not have avx2): http://buildbot.steinwurf.dk:12344/builders/cpuid-debian8_64-cxx_clang34_x86-py27/builds/56/steps/test/logs/stdio http://buildbot.steinwurf.dk:12344/builders/cpuid-debian8_64-cxx_clang34_x86-py27/builds/52/steps/test/logs/stdio

false detected instead of the true (debian6 with a Haswell): http://buildbot.steinwurf.dk:12344/builders/cpuid-debian8_64-cxx_gxx48_x86-py27/builds/268/steps/test/logs/stdio

We should double-check the capabilities of our machines, and we can also look at AVX2 detection in other libs.

petya2164 commented 10 years ago

I reintroduced the inline assembly solution which fixed a lot of issues with the 32-bit builds. The __get_cpuid function was giving inconsistent results (especially with 32-bit clang). I also considered Jeppe's -fPIC problem and the current solution should handle that, but we should do a thorough test in fifi-swig.

The AVX2 detection seems to be correct now, except for these builds: http://buildbot.steinwurf.dk:12344/builders/cpuid-debian8_64-cxx_gxx48_x86-py27/builds/275/steps/test/logs/stdio http://buildbot.steinwurf.dk:12344/builders/cpuid-debian8_64-cxx_clang34_x86-py27/builds/61/steps/test/logs/stdio In these 32-bit builds, cpuid indicates that AVX2 is not supported on the machines with Haswell CPUs (but everything is fine in 64-bit builds). It might be that we cannot use AVX2 in 32-bit binaries, but I did not find any sources to confirm that (and we don't have a Haswell machine with a 32-bit OS to test this). If that suspicion is true, then we will need to specify the avx2 CPU cap for the individual mkspecs, for example: cxx_gxx48_x64: { avx2: true }

mortenvp commented 10 years ago

What exactly was the problem with the __get_cpuid? I think Jeppe did not have any issues with it before he merged?

petya2164 commented 10 years ago

@mortenvp __get_cpuid gives false positives for AVX2 with 32-bit clang builds. We did not check for that capability before. The current problem is that we get "false" negatives (on Haswell) for all 32-bit builds on Linux. But I don't know if it is supposed to be negative in 32-bit mode.

nestorjhernandezm commented 10 years ago

@petya2164 It seems that first it must be checked if the CPU supports 7 as an input for eax. It might be possible this is not the case for the 32-bit builds on Clang. Here are some references:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50740 Bug describing the situation

http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/driver-i386.c?view=markup&pathrev=180304#l413 Detecting max level of cpuid eax input

http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/driver-i386.c?view=markup&pathrev=180304#l454 Once level is known to exist, check the flag

Taking a look at the API seems it can be managed ... At the very end __get_cpuid_max does the job since it checks the 32-bit macro: http://clang.llvm.org/doxygen/cpuid_8h_source.html

What do you think guys? The only remaining would be to check if this does not interfere with the -fPIC flag solution.

petya2164 commented 10 years ago

I found out that AVX2 works for both 64-bit and 32-bit binaries. I also found the inline assembly on Intel's website that produces correct results in all cases: https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family Everything is green now after several reruns. Before the merge, it would be nice to put this link into the relevant source file, but it is way too long ;) Is it safe/future-proof to use a URL shortener in your opinion?

nestorjhernandezm commented 10 years ago

Great Péter! Couple of things:

  1. I noticed a small missing double quotes, will add it.
  2. We might use goo.gl as a shortener as this should work for very long time. Actually here (https://groups.google.com/forum/#!topic/google-url-shortener/8pxNEt8EG8U) it is said "forever" but I would like to see that
mortenvp commented 10 years ago

I think the code looks a lot like gcc's implementation of the __cpuid so I guess clang did not update their implementation yet.. Isn't edx missing from the clobber list?

nestorjhernandezm commented 10 years ago

Seems Clang did it: http://clang.llvm.org/doxygen/cpuid_8h_source.html , line 95. There are some little differences. For edx, it should be on the list of output parameters. The only one that was tricky was ebx because it may be the PIC register. Should we consider something else?

mortenvp commented 10 years ago

I agree looks like they also correctly take care of this. Which version of clang is this from?

nestorjhernandezm commented 10 years ago

My guess would be version 3.4 since its in the most recent Doxygen documentation.

petya2164 commented 10 years ago

Thanks for the corrections, Néstor! I would like to point out that the 32-bit detection failed for both gcc and clang when we used their __cpuid. I think it is better to have our inline assembly which will work in all cases. But we should coordinate with Jeppe to test the behavior with -fPIC in the Python bindings. We will merge after we checked that.

petya2164 commented 10 years ago

The Python bindings are working fine with this branch. I will merge now.

mortenvp commented 10 years ago

What I don't understand is what is different with our inline assembly and the stuff implemented in gcc and clang? Since when I looked at the gcc code it looked very similar to what we are doing here - and what nestor linked seems also very similar - so why isn't their implementations working?

petya2164 commented 10 years ago

I think the input/output registers are correctly marked with the + sign. EAX, EBX and ECX are input/output, and the compiler will do some extra precautions to preserve the state of these.

nestorjhernandezm commented 10 years ago

The + sign indicates the inline assembler that those registers are both input and output to the assembler instruction which is particularly important here due to the nature of ebx as the (possible) PIC flag. E.g. Péter's interpretation is likely to be right. I took a look to GCC compiler constraints to check this here: http://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers ... I guess the Intel guys maybe detected this earlier and came out with their solution

mortenvp commented 10 years ago

Yes, if you are referring to by comment about the clobber list, I meant whether edi should have been in the clobber list. However for that code it is not needed since we put D as constraint which means that edi should be used for the output so the compiler know we change it.

Btw for future reference I would +1 vote for not interleaving code if the #ifdef stuff .. my brain starts to hurt when I need to read a function and play preprocessor at the same time. Rather make two sections of complete code. So I can first read one and then the other.

nestorjhernandezm commented 10 years ago

Clobber list about edi: Right.

Interleaved code: Actually it took me little time to get it too, however will consider it next time that I'm reviewing code. Readability is a plus and can be subjective :+1:

mortenvp commented 10 years ago

:) yes beauty is in the eye of the beholder ... or something like that ;)