Open malvanos opened 5 years ago
Probably it would be a good idea, it should be enough to to use the xxpermdi wrapper and add one for the aligned store.
Yes, from what I understand from the code we only need to replace the vec_xxpermdi with xxpermdi:
--- a/common/ppc/ppccommon.h
+++ b/common/ppc/ppccommon.h
@@ -150,7 +150,7 @@ typedef union {
**********************************************************************/
#ifndef __POWER9_VECTOR__
#define VEC_STORE8( v, p ) \
- vec_vsx_st( vec_xxpermdi( v, vec_vsx_ld( 0, p ), 1 ), 0, p )
+ vec_vsx_st( xxpermdi( v, vec_vsx_ld( 0, p ), 1 ), 0, p )
#else
#define VEC_STORE8( v, p ) vec_xst_len( v, p, 8 )
#endif
However, after the change, the pixel_sad_x4 fails. I will investigate it.
One of the sum vectors is not initialized in the PIXEL_SAD_X4_ALTIVEC define. Here is the fix :
--- a/common/ppc/pixel.c
+++ b/common/ppc/pixel.c
@@ -981,6 +981,7 @@ static int name( uint8_t *fenc,
sum0v = vec_splat_s32( 0 ); \
sum1v = vec_splat_s32( 0 ); \
sum2v = vec_splat_s32( 0 ); \
+ sum3v = vec_splat_s32( 0 ); \
\
for( int y = 0; y < ly; y++ ) \
{
I will send a patch next days that fixes both the compilation on non-vsx and initialization in sad_4x.
VEC_STORE8 uses vec_xxpermdi that is only available to processors supporting VSX.