The alpha blend in ssse3 uses dr * (255-a) / 256 + sr
Consider using dr - dr * a / 256 + sr
which avoids inverting the alpha, fixes an off by 1, and can use byte math
instead of words for the sub/add.
The first part of the loop would start with:
movdqa xmm3, [eax] // src argb
lea eax, [eax + 16]
movdqa xmm0, xmm3 // src argb
pshufb xmm3, kShuffleAlpha // alpha
movdqa xmm2, [esi] // _r_b
pand xmm2, xmm6 // _r_b
pmullw xmm2, xmm3 // _r_b * alpha
movdqa xmm1, [esi] // _a_g
lea esi, [esi + 16]
psrlw xmm1, 8 // _a_g
pmullw xmm1, xmm3 // _a_g * alpha
psrlw xmm2, 8 // _r_b convert to 8 bits again
saves 2 instructions so far
Original issue reported on code.google.com by fbarch...@google.com on 21 Nov 2012 at 6:52
Original issue reported on code.google.com by
fbarch...@google.com
on 21 Nov 2012 at 6:52