The SSE/VSX wrappers for ppc64[el] are missing the _mm_loadu_si64() function. This function appears to largely be an alias of _mm_set_epi64(), with an explicit unaligned load capability. However, _mm_set_epi64() also allows unaligned load in practice, and the ppc64[el] wrapper function for _mm_set_epi64() already enabled unaligned loads on POWER7+.
Therefore, it appears the needed function is as follows -- this was tested on a Talos II workstation (POWER9) in Skia and functions correctly:
/* Load signed 64-bit integer from P into vector element 0. The address need not be 16-byte aligned. */
extern __inline __m128i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_loadu_si64 (void const *__P)
{
return _mm_set_epi64((__m64)0LL, *(__m64 *)__P);
}
If desired I can create a merge request to add this function in to emmintrin.h.
The SSE/VSX wrappers for ppc64[el] are missing the `_mm_loadu_si64()` function. This function appears to largely be an alias of `_mm_set_epi64()`, with an explicit unaligned load capability. However, `_mm_set_epi64()` also allows unaligned load in practice, and the ppc64[el] wrapper function for `_mm_set_epi64()` already enabled unaligned loads on POWER7+.
Therefore, it appears the needed function is as follows -- this was tested on a Talos II workstation (POWER9) in Skia and functions correctly:
```
/* Load signed 64-bit integer from P into vector element 0. The address need not be 16-byte aligned. */
extern __inline __m128i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_loadu_si64 (void const *__P)
{
return _mm_set_epi64((__m64)0LL, *(__m64 *)__P);
}
```
If desired I can create a merge request to add this function in to `emmintrin.h`.
The SSE/VSX wrappers for ppc64[el] are missing the
_mm_loadu_si64()
function. This function appears to largely be an alias of_mm_set_epi64()
, with an explicit unaligned load capability. However,_mm_set_epi64()
also allows unaligned load in practice, and the ppc64[el] wrapper function for_mm_set_epi64()
already enabled unaligned loads on POWER7+.Therefore, it appears the needed function is as follows -- this was tested on a Talos II workstation (POWER9) in Skia and functions correctly:
If desired I can create a merge request to add this function in to
emmintrin.h
.