openucx / ucc

Unified Collective Communication Library
https://openucx.github.io/ucc/
BSD 3-Clause "New" or "Revised" License
177 stars 85 forks source link

EC/CPU: Fix int8 auto-vectorization with gcc 4.8.2 #930

Closed nsarka closed 4 months ago

nsarka commented 4 months ago

This PR is an extension to https://github.com/openucx/ucc/pull/918.

Sergey had found through testing that while the previous patch worked for newer versions of gcc, the gcc version we use for testing HPCX (gcc 4.8.2) still wasn't auto-vectorizing int8 reductions. The problem was that the last patch did not also move the destination vector into the ucc_ec_cpu_reduce function's arguments as a restrict pointer. This patch fixes that.

In short, char/int8 datatypes are special to the compiler in that they can be used to modify other datatype's memory, so an int8 pointer needs to be marked as restrict in order for the compiler to be confident enough to auto-vectorize. We were doing that previously, but only by making a new local variable. For some reason, GCC will ignore the restrict keyword on local variables. Only function arguments can be marked as restrict. I did some looking around online and found that this is a long-withstanding bug in gcc, reported 10 years ago: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60712

swx-jenkins3 commented 4 months ago

Can one of the admins verify this patch?

Sergei-Lebedev commented 4 months ago

ok to test