keeplive / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

Post Bias Y for YUV to RGB #388

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In the C functions

static __inline void YuvPixel(uint8 y, uint8 u, uint8 v,
                              uint8* b, uint8* g, uint8* r) {
  int32 y1 = ((int32)(y) - 16) * YG;
  *b = Clamp((int32)((u * UB + v * VB) - (BB) + y1) >> 6);
  *g = Clamp((int32)((u * UG + v * VG) - (BG) + y1) >> 6);
  *r = Clamp((int32)((u * UR + v * VR) - (BR) + y1) >> 6);
}

The y - 16 can be reworked to post bias
  int32 y1 = (int32)(y * YG);
and add 16 * YG to the bias for BB, BG and BR.

Check Neon and AVX2 code for similar optimization.

Original issue reported on code.google.com by fbarch...@google.com on 29 Dec 2014 at 9:43

GoogleCodeExporter commented 9 years ago
OSX
Was
45541 - [       OK ] libyuvTest.I420ToARGB_Opt (45541 ms)
43062 - [       OK ] libyuvTest.J420ToARGB_Opt (43062 ms)

Now
44052 - [       OK ] libyuvTest.I420ToARGB_Opt (44052 ms)
43110 - [       OK ] libyuvTest.J420ToARGB_Opt (43110 ms)

Original comment by fbarch...@google.com on 29 Dec 2014 at 11:05

GoogleCodeExporter commented 9 years ago
SSSE3 was
I420ToARGB_Opt (5169 ms)

now
I420ToARGB_Opt (4830 ms)

Original comment by fbarch...@google.com on 30 Dec 2014 at 4:08

GoogleCodeExporter commented 9 years ago
This is complete for x86 code.

Arm used signed math, so there is no bias for the -128 that would make it free.
Also the shift on arm can round, so theres no add for rounding.  Unlike x86 
which requires an add to do rounding.
Its unclear if the bias could be done.. it would need unsigned versions of 
multiplies/adds.  Likely.  Its likely not a performance benefit, so it would 
mainly ensure the code exactly mimics the x86/c code.

Closing as fixed.  Followup improvements to unittests would be good and/or 
reexamine code for arm or x86 performance improvements.

Original comment by fbarch...@google.com on 5 Jan 2015 at 6:36