watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

alpha blend ssse3 #162

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The alpha blend in ssse3 uses dr * (255-a) / 256 + sr
Consider using dr - dr * a / 256 + sr
which avoids inverting the alpha, fixes an off by 1, and can use byte math 
instead of words for the sub/add.

The first part of the loop would start with:

movdqa     xmm3, [eax]      // src argb                     
lea        eax, [eax + 16]                                  
movdqa     xmm0, xmm3       // src argb                     
pshufb     xmm3, kShuffleAlpha // alpha                     
movdqa     xmm2, [esi]      // _r_b                         
pand       xmm2, xmm6       // _r_b                         
pmullw     xmm2, xmm3       // _r_b * alpha                 
movdqa     xmm1, [esi]      // _a_g                         
lea        esi, [esi + 16]                                  
psrlw      xmm1, 8          // _a_g                         
pmullw     xmm1, xmm3       // _a_g * alpha                 
psrlw      xmm2, 8          // _r_b convert to 8 bits again
saves 2 instructions so far

Original issue reported on code.google.com by fbarch...@google.com on 21 Nov 2012 at 6:52

GoogleCodeExporter commented 9 years ago
Marking wontfix.  If blend effeciency is needed on x86, refocus on avx2 version.

Original comment by fbarch...@google.com on 12 Jan 2013 at 9:08