lu-zero / x264

My experiments on the x264 codebase
GNU General Public License v2.0
0 stars 7 forks source link

PPC: VEC_STORE8 does not work on non vsx machines #16

Open malvanos opened 5 years ago

malvanos commented 5 years ago

VEC_STORE8 uses vec_xxpermdi that is only available to processors supporting VSX.

malvanos commented 5 years ago

@lu-zero I saw a patch that included a fix, but it was removed later.

Should we create a fallback in case the CPU does not have VSX support?

lu-zero commented 5 years ago

Probably it would be a good idea, it should be enough to to use the xxpermdi wrapper and add one for the aligned store.

malvanos commented 5 years ago

Yes, from what I understand from the code we only need to replace the vec_xxpermdi with xxpermdi:

--- a/common/ppc/ppccommon.h
+++ b/common/ppc/ppccommon.h
@@ -150,7 +150,7 @@ typedef union {
  **********************************************************************/
 #ifndef __POWER9_VECTOR__
 #define VEC_STORE8( v, p ) \
-    vec_vsx_st( vec_xxpermdi( v, vec_vsx_ld( 0, p ), 1 ), 0, p )
+    vec_vsx_st( xxpermdi( v, vec_vsx_ld( 0, p ), 1 ), 0, p )
 #else
 #define VEC_STORE8( v, p ) vec_xst_len( v, p, 8 )
 #endif

However, after the change, the pixel_sad_x4 fails. I will investigate it.

malvanos commented 5 years ago

One of the sum vectors is not initialized in the PIXEL_SAD_X4_ALTIVEC define. Here is the fix :

--- a/common/ppc/pixel.c
+++ b/common/ppc/pixel.c
@@ -981,6 +981,7 @@ static int name( uint8_t *fenc,
     sum0v = vec_splat_s32( 0 );                                                       \
     sum1v = vec_splat_s32( 0 );                                                       \
     sum2v = vec_splat_s32( 0 );                                                       \
+    sum3v = vec_splat_s32( 0 );                                                       \
                                                                                       \
     for( int y = 0; y < ly; y++ )                                                     \
     {           

I will send a patch next days that fixes both the compilation on non-vsx and initialization in sad_4x.