Sergey had found through testing that while the previous patch worked for newer versions of gcc, the gcc version we use for testing HPCX (gcc 4.8.2) still wasn't auto-vectorizing int8 reductions. The problem was that the last patch did not also move the destination vector into the ucc_ec_cpu_reduce function's arguments as a restrict pointer. This patch fixes that.
In short, char/int8 datatypes are special to the compiler in that they can be used to modify other datatype's memory, so an int8 pointer needs to be marked as restrict in order for the compiler to be confident enough to auto-vectorize. We were doing that previously, but only by making a new local variable. For some reason, GCC will ignore the restrict keyword on local variables. Only function arguments can be marked as restrict. I did some looking around online and found that this is a long-withstanding bug in gcc, reported 10 years ago: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60712
This PR is an extension to https://github.com/openucx/ucc/pull/918.
Sergey had found through testing that while the previous patch worked for newer versions of gcc, the gcc version we use for testing HPCX (gcc 4.8.2) still wasn't auto-vectorizing int8 reductions. The problem was that the last patch did not also move the destination vector into the
ucc_ec_cpu_reduce
function's arguments as arestrict
pointer. This patch fixes that.In short, char/int8 datatypes are special to the compiler in that they can be used to modify other datatype's memory, so an int8 pointer needs to be marked as
restrict
in order for the compiler to be confident enough to auto-vectorize. We were doing that previously, but only by making a new local variable. For some reason, GCC will ignore therestrict
keyword on local variables. Only function arguments can be marked asrestrict
. I did some looking around online and found that this is a long-withstanding bug in gcc, reported 10 years ago: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60712