google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.17k stars 319 forks source link

Add POWER4 / POWER5 #1152

Closed malaterre closed 1 year ago

malaterre commented 1 year ago

POWER8/POWER9 support has been added recently, it would be nice to also have POWER4/POWER5 support.

jan-wassenberg commented 1 year ago

Hi, I'm not familiar with POWER4, looks like that launched in 2001 :o I'm curious what the use-case is?

malaterre commented 1 year ago

Debian supports two big-endian arches with simd:

ppc32 is technically G3, but since it can run on G4 I assumed some altivec code could be "borrowed" for those systems.

ppc64 is supposed to run on G5, so again some recent altivec code be "borrowed" for this arch.

jan-wassenberg commented 1 year ago

Got it. It's plausible that the current code could run on G5 with hopefully not too major modifications. If someone is willing to do so and update run_tests.sh with the required qemu/flags, I would consider maintaining that, especially if there is someone who actually wants to run Highway on that arch.

johnplatts commented 1 year ago

There are some limitations on targets that don't support VSX such as PowerPC G4/PowerPC G5/POWER4/POWER5/POWER6, including the following:

I originally included support for targets that supported Altivec but not VSX in the port of Highway to PPC, but I removed the support for Altivec/PPC7 from the hwy/ops/ppc_vsx-inl.h header as some of the Highway unit tests were failing on the Altivec target (but passing on little-endian PPC8/PPC9/PPC10 targets).

johnplatts commented 1 year ago

Here is a gist that has implementations of various int64_t, uint64_t, and float vector operations for Altivec: https://gist.github.com/johnplatts/761fc35054eeb2a83b15cebd4b6ef288

jan-wassenberg commented 1 year ago

Thanks @johnplatts for the list. It looks like some nontrivial effort is required, but might be worthwhile if/when someone actually wants to run on POWER4. Let's wait to see if such a use case arises?

FYI it is legitimate for a Highway implementation to set

#define HWY_HAVE_INTEGER64 0
#define HWY_HAVE_FLOAT16 0
#define HWY_HAVE_FLOAT64 0

This would take care of many of the missing bits, but also be less useful for apps that actually do want to use 64-bit operations.

malaterre commented 1 year ago

@jan-wassenberg Would it be acceptable for the time being to simply copy/paste x86_128-inl.h for power4/power5 with gcc rs6000 helps (I think clang also has some ppc wrappers):

?

jan-wassenberg commented 1 year ago

Interesting, I didn't know the compiler ships something like that. sse2neon/neon2sse are indeed useful. If we copy-paste the entire file, that will be a larger maintenance burden. How about we do something like, at the end of highway.h, also including x86_128-inl.h #if HWY_TARGET == HWY_PPC4?

malaterre commented 1 year ago

How about we do something like, at the end of highway.h, also including x86_128-inl.h #if HWY_TARGET == HWY_PPC4?

I believe I misread the documentation. This port is only for powerpc64el, so this will never work for POWER4/POWER5.

Technically it even fails for PPC8 with random inconstencies (SSE vs AVX...), not sure what gcc is supposed to support here:

[  1%] Building CXX object CMakeFiles/hwy.dir/hwy/per_target.cc.o
In file included from /home/malat/highway/hwy/highway.h:384,
                 from /home/malat/highway/hwy/per_target.cc:21:
/home/malat/highway/hwy/ops/x86_128-inl.h: In function 'hwy::N_PPC8::Vec128<unsigned char> hwy::N_PPC8::AESRound(Vec128<unsigned char>, Vec128<unsigned char>)':
/home/malat/highway/hwy/ops/x86_128-inl.h:5883:26: error: '_mm_aesenc_si128' was not declared in this scope; did you mean '_mm_testnzc_si128'?
 5883 |   return Vec128<uint8_t>{_mm_aesenc_si128(state.raw, round_key.raw)};
      |                          ^~~~~~~~~~~~~~~~
      |                          _mm_testnzc_si128
jan-wassenberg commented 1 year ago

Ah, OK. So we don't yet have a drop-in solution for PPC4.

It's not surprising they do not support _mm_aesenc_si128 - (efficiently) emulating that in software is a couple hundred lines of tricky code.

jan-wassenberg commented 1 year ago

We haven't yet heard of potential Highway users on POWER4/5, but please feel free to reopen if that changes :)