Closed dubek closed 3 years ago
Works for my setup but might not work for a VirtualBox host having AVX but not AVX2? Or put differently, wouldn't the same treatment applied to HAVE_AVX2 also be better applied as well to HAVE_AVX?
Thanks @g2flyer for verifying.
I think that all the FMA operations are called inside a #ifdef HAVE_AVX2
section; that was my idea for a minimal change (but maybe I missed something). You might be right. I'm waiting to hear the maintainers' opinion.
It looks like this should be OK. From what I read online, it seems like this it should have been OK as it was... see https://stackoverflow.com/questions/16348909/how-do-i-know-if-i-can-compile-with-fma-instruction-sets
Indeed, according to that link, with the proposed patch, it looks like it will not detect FMA in Windows C++. But I don't think we are even targeting that platform.
This also raises the question as to whether I should also patch my copy of PGFFT. I'll think about it.
Thanks @g2flyer for verifying.
I think that all the FMA operations are called inside a
#ifdef HAVE_AVX2
section; that was my idea for a minimal change (but maybe I missed something). You might be right. I'm waiting to hear the maintainers' opinion.
@dubek sorry, you are correct. I misread code and wrongly thought the FMA code was conditioned on USE_PD4.
The changes proposed in this PR have passed all regression tests on the Intel Haswell and Ivy Bridge platforms with bare metal Linux (no virtualization environments such as VirtualBox and/or VMWare), and also on Intel Haswell Mac OS. As such, we can only assert that the proposed changes do not cause any regression on the following build&test platforms:
Linux on Intel Haswell and Ivy Bridge:
MacOS on Intel Haswell:
See https://github.com/homenc/HElib/blob/master/INSTALL.md for a full list of the dependencies for each build tested.
The proposed changes do not cause any regression on Mainframe Z14 Bare Metal Linux - Ubuntu 18.04, Compiler GCC.
Successful tests reported on:
When compiling in a VirtualBox VM, AVX2 instructions are supported but FMA instructions are not. In such case, compiling PGFFT.cpp fails with:
This fix narrows the detection so that only when both
__AVX2__
and__FMA__
are defined then the special FMA implementation will be used.Note that I didn't replace
HAVE_AVX2
toHAVE_AVX2_AND_FMA
throughout the file.Fix #426
Signed-off-by: Dov Murik dov.murik1@il.ibm.com