Open valassi opened 2 years ago
Specifically, note that ARM and Power9 code would no longer go through "#if defined __SSE4_2__".
The set of ifdefs however would become a bit more complex, because one should check whether the user-required mode is supported at all. For instance
Note also that one can use 'AVX=sse4' (I would keep this tag, as mentioned(, but the code prints out already now 'sse4', 'ppcv' or 'neon' depending on the architecture, https://github.com/madgraph5/madgraph4gpu/blob/e2c4c0a3d66b35166bcf89cf73170f05ac872cd1/epochX/cudacpp/gg_tt/SubProcesses/P1_Sigma_sm_gg_ttx/check_sa.cc#L753
// -- SIMD matrix elements?
#if !defined MGONGPU_CPPSIMD
wrkflwtxt += "/none";
#elif defined __AVX512VL__
#ifdef MGONGPU_PVW512
wrkflwtxt += "/512z";
#else
wrkflwtxt += "/512y";
#endif
#elif defined __AVX2__
wrkflwtxt += "/avx2";
#elif defined __SSE4_2__
#ifdef __PPC__
wrkflwtxt += "/ppcv";
#elif defined __ARM_NEON__
wrkflwtxt += "/neon";
#else
wrkflwtxt += "/sse4";
#endif
#else
wrkflwtxt += "/????"; // no path to this statement
#endif
This is related to #585 where I suggest -march based on thsings like x86-64-v3 https://www.phoronix.com/news/GCC-11-x86-64-Feature-Levels
The SIMD modes however should be more generic, and x86-agnostic if possible (but the register size alone is not enough, as both vx2 and 512y use 256 bit registers...)
This is a followup to #221 and PR #421.
The SIMD modes we are supporting were initially designed for Intel, and are using an Intel naming convention, both in their short tag description and especially in their ifdefs in the code. Non-Intel platforms are supported using the ugly hack of adding Intel like defines in the code instead of using native ones (namely, Power9 VSX and ARM Neon, both 128 bits, are supported by artificially adding an SSE42 define).
This should be improved in one or maybe two ways:
About the first point, the part that needs to be changed is especially this one https://github.com/madgraph5/madgraph4gpu/blob/e2c4c0a3d66b35166bcf89cf73170f05ac872cd1/epochX/cudacpp/gg_tt/src/mgOnGpuConfig.h#L125
and correspondingly this one https://github.com/madgraph5/madgraph4gpu/blob/e2c4c0a3d66b35166bcf89cf73170f05ac872cd1/epochX/cudacpp/gg_tt/SubProcesses/Makefile#L221
One possibility could be to use defines like
For the tag names, I would be inclined to keep 'none', 'sse42' etc, precisely because things like '512y' are very difficult to describe in terms of register width (otherwise it would be '256plus' or something similar...).
To be discussed. Not too urgent, anyway.