lu-zero / x264

My experiments on the x264 codebase
GNU General Public License v2.0
0 stars 7 forks source link

SATD improve #12

Open malvanos opened 5 years ago

malvanos commented 5 years ago

Improve satd and intra_satd_x3 by using vec_extract instead of vec_splat and vec_ste:

Power 8: satd_4x4_altivec: 148 ==> satd_4x4_altivec: 87 satd_4x8_altivec: 186 ==> satd_4x8_altivec: 128 satd_8x4_altivec: 177 ==> satd_8x4_altivec: 114 satd_8x8_altivec: 188 ==> satd_8x8_altivec: 136 satd_8x16_altivec: 300 ==> satd_8x16_altivec: 262 satd_16x8_altivec: 269 ==> satd_16x8_altivec: 271 satd_16x16_altivec: 517 ==> satd_16x16_altivec: 485 intra_satd_x3_4x4_altivec: 528 ==> intra_satd_x3_4x4_altivec: 444 intra_satd_x3_8x8c_altivec: 679 ==> intra_satd_x3_8x8c_altivec: 593 intra_satd_x3_16x16_altivec: 1815 ==> intra_satd_x3_16x16_altivec: 1724

Power 9: satd_4x4_altivec: 131 ==> satd_4x4_altivec: 113 satd_4x8_altivec: 175 ==> satd_4x8_altivec: 156 satd_8x4_altivec: 150 ==> satd_8x4_altivec: 135 satd_8x8_altivec: 174 ==> satd_8x8_altivec: 161 satd_8x16_altivec: 290 ==> satd_8x16_altivec: 277 satd_16x8_altivec: 272 ==> satd_16x8_altivec: 272 satd_16x16_altivec: 563 ==> satd_16x16_altivec: 566 intra_satd_x3_4x4_altivec: 424 ==> intra_satd_x3_4x4_altivec: 400 intra_satd_x3_8x8c_altivec: 687 ==> intra_satd_x3_8x8c_altivec: 616 intra_satd_x3_16x16_altivec: 2047 ==> intra_satd_x3_16x16_altivec: 2062