llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.77k stars 11.9k forks source link

[ppc64] SSE/VSX wrapper missing _mm_loadu_si64 function #91247

Open madscientist159 opened 5 months ago

madscientist159 commented 5 months ago

The SSE/VSX wrappers for ppc64[el] are missing the _mm_loadu_si64() function. This function appears to largely be an alias of _mm_set_epi64(), with an explicit unaligned load capability. However, _mm_set_epi64() also allows unaligned load in practice, and the ppc64[el] wrapper function for _mm_set_epi64() already enabled unaligned loads on POWER7+.

Therefore, it appears the needed function is as follows -- this was tested on a Talos II workstation (POWER9) in Skia and functions correctly:

/* Load signed 64-bit integer from P into vector element 0.  The address need not be 16-byte aligned.  */
extern __inline __m128i
    __attribute__((__gnu_inline__, __always_inline__, __artificial__))
    _mm_loadu_si64 (void const *__P)
{
  return _mm_set_epi64((__m64)0LL, *(__m64 *)__P);
}

If desired I can create a merge request to add this function in to emmintrin.h.

llvmbot commented 5 months ago

@llvm/issue-subscribers-backend-powerpc

Author: Timothy Pearson (madscientist159)

The SSE/VSX wrappers for ppc64[el] are missing the `_mm_loadu_si64()` function. This function appears to largely be an alias of `_mm_set_epi64()`, with an explicit unaligned load capability. However, `_mm_set_epi64()` also allows unaligned load in practice, and the ppc64[el] wrapper function for `_mm_set_epi64()` already enabled unaligned loads on POWER7+. Therefore, it appears the needed function is as follows -- this was tested on a Talos II workstation (POWER9) in Skia and functions correctly: ``` /* Load signed 64-bit integer from P into vector element 0. The address need not be 16-byte aligned. */ extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_loadu_si64 (void const *__P) { return _mm_set_epi64((__m64)0LL, *(__m64 *)__P); } ``` If desired I can create a merge request to add this function in to `emmintrin.h`.