jaqquery / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

YUV -> RGB conversion is inaccurate? #324

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Patch Chromium locally with: https://codereview.chromium.org/183973019/.
2. Run browser tests: MediaTest.*

What is the expected output?
Tests pass.

What do you see instead?
Tests fail because of output too different from the reference image.

It is easier to debug if you run in the patched Chromium:
file:///.../src/content/test/data/media/blackwhite.html

which allows to run the tests manually and see results, together with the error.

More details, examples and initial investigation here:
https://code.google.com/p/chromium/issues/detail?id=347422

This issue is blocking enabling libyuv on Chromium, which would drastically 
boost performance of WebRTC on ARM devices.

Original issue reported on code.google.com by mtomasz@chromium.org on 11 Apr 2014 at 12:51

GoogleCodeExporter commented 8 years ago
I just checked with a file encoded with ffmpeg. media produces an output much 
closer to the original one than libyuv.

Original comment by mtomasz@chromium.org on 11 Apr 2014 at 2:53

GoogleCodeExporter commented 8 years ago
I'm not able to build chrome (help needed!). Would you be able to provide a way 
to test this with libyuv?

It sounds like the nature of the issue is that I420ToARGB does match ffmpeg?

Note that this function produces slightly different results on Neon vs SSSE3.  
The C version matches the SSSE3 version.  The libyuv unittests allow some 
tolerance for functions like this.

Original comment by fbarch...@chromium.org on 11 Apr 2014 at 10:51

GoogleCodeExporter commented 8 years ago
> I'm not able to build chrome (help needed!). Would you be able to provide a 
way to test this with libyuv?

Please follow 
https://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Use_goma. Let me know 
if you have any problems.

> It sounds like the nature of the issue is that I420ToARGB does match ffmpeg?

I think you mean't *doesn't*. I420ToARGB produces high error for an input image 
encoded with ffmpeg to ogv. I'm sceptical, if the error is smaller for any 
other encoder, so it shouldn't be related to ffmpeg.

> Note that this function produces slightly different results on Neon vs SSSE3.
> The C version matches the SSSE3 version.  The libyuv unittests allow some 
tolerance
> for functions like this.

The unit tests allow 1 pixel of error. libyuv conversion has an error equal to 
2.

Original comment by mtomasz@chromium.org on 12 Apr 2014 at 1:29

GoogleCodeExporter commented 8 years ago
First step needs to be a reproducible error.
Or if you have an idea of what the bug is?

Original comment by fbarch...@google.com on 30 Apr 2014 at 12:25

GoogleCodeExporter commented 8 years ago
I already mentioned it in the bug entry on crbug.com. I think the constants are 
wrong, since they do not match the formulas in comments.

Original comment by mtomasz@chromium.org on 30 Apr 2014 at 12:32

GoogleCodeExporter commented 8 years ago
Posting mtomasz@ comments here:

in the fallback C code only:
source/row_common.cc

Snippet:
#define YG 74 /* (int8)(1.164 * 64 + 0.5) */

#define UB 127 /* min(63,(int8)(2.018 * 64)) */
#define UG -25 /* (int8)(-0.391 * 64 - 0.5) */
#define UR 0

#define VB 0
#define VG -52 /* (int8)(-0.813 * 64 - 0.5) */
#define VR 102 /* (int8)(1.596 * 64 + 0.5) */

However, these constants do not match the formulas on the left.

|    |   real   |   rounded   |   libyuv   | difference
| YG | 74.996   | 75          | 74         | (75 vs. 74)

| UB*| 129.152  | 129         | 127        | (129 vs. 127, or even 63 vs. 127!)
| UG | -25.524  | -26         | -25        | (-25 vs. -26)
| UR | 0        | 0           | 0          | OK

| VB | 0        | 0           | 0          | OK
| VG | -52.532  | -53         | -52        | (-52 vs. -53)
| VR | 102.644  | 103         | 102        | (103 vs. 102)

* I guess the formula in the comment is wrong.

So, first of all we are flooring instead of rounding. I'm not familiar with the 
algorithm, but intuitively, rounding should give more accurate results than 
flooring.

Also, if we really want a hard cast over rounding, then the UB value is 
significantly wrong.

I've updated these constants to the rounded values, and the accuracy on the 
test image increased drastically!

Original comment by fbarch...@chromium.org on 30 Apr 2014 at 1:00

GoogleCodeExporter commented 8 years ago
To test the new jpeg color space J420 I wrote a test to convert from ARGB to 
J420 and back to ARGB and measure the error.  Its off by a max of 3.

But for I420 its off by 6?  Its a compressed color space, so that may explain 
some error, but it worth looking into.

I've written a color_test module to test just the color matrix aspect of yuv 
to/from rgb.
r1229 tests against a float version of YUV to RGB.

YG = 1.164 * 64 = 74.496 which rounded is 74.
On unsigned values, floor(value + 0.5) was used for rounding.

UBg-25.024

Original comment by fbarch...@chromium.org on 15 Jan 2015 at 6:36

GoogleCodeExporter commented 8 years ago
A new set of tests has isolated 2 issues

Y channel has error due to 2.6 fixed point.

the B channel coefficient is >2 and if clamped to 2 is off.

It may be possible to use 1.7 fixed point, but it will depend on the nuances of 
pmaddubsw.
This C code shows 7 bit fixed point values

// C reference code that mimics the YUV assembly.

#define YG 149 /* round(1.164 * 128) */

#define UB 255 /* min(127, round(2.018 * 128)) */
#define UG -50 /* round(-0.391 * 128) */
#define UR 0

#define VB 0
#define VG -104 /* round(-0.813 * 128) */
#define VR 204 /* round(1.596 * 128) */

// Bias
#define BB (UB * 128 + VB * 128 + YG * 16)
#define BG (UG * 128 + VG * 128 + YG * 16)
#define BR (UR * 128 + VR * 128 + YG * 16)

void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
//  uint32 y1 = (uint32)round(y * 1.164 * 128);
  uint32 y1 = (uint32)(y * YG);
  b = Clamp((int32)(u * UB + v * VB + y1 - BB) >> 7);
  g = Clamp((int32)(u * UG + v * VG + y1 - BG) >> 7);
  r = Clamp((int32)(u * UR + v * VR + y1 - BR) >> 7);
}

and a test shows the tolerances vs float improve:

  for (int i = 0; i < 1000; ++i) {
    int yr = random() & 255;
    int ur = random() & 255;
    int vr = random() & 255;

    TestYUVToRGBReference(yr, ur, vr, r0, g0, b0);
    TestYUVToRGB(yr, ur, vr, r1, g1, b1, benchmark_width_, benchmark_height_);
    TestYUVToRGBInt(yr, ur, vr, r2, g2, b2);
    EXPECT_NEAR(r0, r1, 3);
    EXPECT_NEAR(g0, g1, 3);
    EXPECT_NEAR(b0, b1, 5);
    EXPECT_NEAR(r0, r2, 1);
    EXPECT_NEAR(g0, g2, 1);
    EXPECT_NEAR(b0, b2, 4);
  }

The other approach would stay with 6 bit fixed point on U and V, which dont 
have a problem, but use 2.14 for Y channel.  This works out nicely on intel

The current code takes the Y channel and multiplies by 2.6 fixed point (74)

movq       xmm3, qword ptr [eax]                 
lea        eax, [eax + 8]                        
punpcklbw  xmm3, xmm4                            
pmullw     xmm3, kYToRgb                         
paddsw     xmm0, xmm3           /* B += Y */     
paddsw     xmm1, xmm3           /* G += Y */     
paddsw     xmm2, xmm3           /* R += Y */     
psraw      xmm0, 6                               
psraw      xmm1, 6                               
psraw      xmm2, 6                               
packuswb   xmm0, xmm0           /* B */          
packuswb   xmm1, xmm1           /* G */          
packuswb   xmm2, xmm2           /* R */          

The first 4 instructions could be changed to:
movq       xmm3, qword ptr [eax]  
lea        eax, [eax + 8]         
punpcklbw  xmm3, xmm3             
pmulhw     xmm3, kYToRgb          

saving a register (xmm4) and gaining precision.
downside is it doesnt port well to C or arm.

Original comment by fbarch...@chromium.org on 17 Jan 2015 at 3:00

GoogleCodeExporter commented 8 years ago
7 bits wont work with pmaddubsw.
pmulhw will work, and can be done in C efficiently by including Y replication 
in the multiplier constant.  e.g. Y * 0x0101 * coefficient

so 6 bits becomes 14 bits accurate
1.164 * 64 * 256 = 19070.976 which rounds to 19071 with small error.
another problem is the bias, so I compute that with the original float:
#define YGB 1192  /* round(1.164 * 64 * 16 ) */

With that the R and G channels are accurate compared to float, with error of 1 
at most.

B channel has issue with 2.018 being beyond range of signed 2.6 bit fixed point.
min(127, round(2.018 * 64) = 127.
127/64 = 1.984375
B channel is off by max of 5 with this value.

changing the cofficients to negatives, allows -128 which is 2.0
then subtract the chroma contributions instead of add.

This is proposed replacement code in C

// C prototype code

#define YG 4901241 /* round(1.164 * 64 * 256 * 257) */
#define YGB 1192  /* round(1.164 * 64 * 16 ) */

#define UB -128 /* -min(128, round(2.018 * 64)) */
#define UG 25 /* -round(-0.391 * 64) */
#define UR 0

#define VB 0
#define VG 52 /* -round(-0.813 * 64) */
#define VR -102 /* -round(1.596 * 64) */

// Bias
#define BB (UB * 128 + VB * 128 - YGB)
#define BG (UG * 128 + VG * 128 - YGB)
#define BR (UR * 128 + VR * 128 - YGB)

void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
  uint32 y1 = (uint32)(y * YG) >> 16;
  b = Clamp((int32)(y1 - (v * VB + u * UB) + BB) >> 6);
  g = Clamp((int32)(y1 - (v * VG + u * UG) + BG) >> 6);
  r = Clamp((int32)(y1 - (v * VR + u * UR) + BR) >> 6);
}

// C prototype code

#define YG 4901241 /* round(1.164 * 64 * 256 * 257) */
#define YGB 1192  /* round(1.164 * 64 * 16 )*/

#define UB -128 /* -min(128, round(2.018 * 64)) */
#define UG 25 /* -round(-0.391 * 64) */
#define UR 0

#define VB 0
#define VG 52 /* -round(-0.813 * 64) */
#define VR -102 /* -round(1.596 * 64) */

// Bias
#define BB (UB * 128 + VB * 128 - YGB)
#define BG (UG * 128 + VG * 128 - YGB)
#define BR (UR * 128 + VR * 128 - YGB)

void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
//  uint32 y1 = (uint32)round(y * 1.164 * 128);

  uint32 y1 = (uint32)(y * YG) >> 16;
  b = Clamp((int32)(y1 - (v * VB + u * UB) + BB) >> 6);
  g = Clamp((int32)(y1 - (v * VG + u * UG) + BG) >> 6);
  r = Clamp((int32)(y1 - (v * VR + u * UR) + BR) >> 6);
}

Original comment by fbarch...@chromium.org on 17 Jan 2015 at 7:17

GoogleCodeExporter commented 8 years ago
r1236 fixes luma channel for intel (avx2) and C.
arm code will need an update to match.

Original comment by fbarch...@chromium.org on 21 Jan 2015 at 1:16

GoogleCodeExporter commented 8 years ago
As of r1241 the chroma channel using negative matrix change is in for intel and 
C on Windows.
A value of -128 is used, which represents 2.0
The value really wants to be 2.018 which rounds to 129.  So its off by 1.
129/64 = 2.015625.  So off by 0.015.
But previously it was 127, which is 1.984375.
Blue channel error is now 3 down from 5.
Will consider rounding and centered bias.
rounding is free, but testing showed error of 2 instead of 1 when compared to 
float.
centered bias should reduce error on large values and increase error on small 
values, for overall less maximum error and better centering of error.
bias on y channel uses float rather than int, but is not centered.
To subtract the negative matrix took 3 extra movdqa instructions and in-order 
sequence to subtract the matrix from the bias values instead of add.
2% slower overall.
Accuracy is roughly complete on Windows and can be tested.
todo: posix and arm ports

Original comment by fbarch...@chromium.org on 21 Jan 2015 at 8:14

GoogleCodeExporter commented 8 years ago
syncing the original YUV to RGB
svn  up row_posix.cc  -r 1235

and using TestFullYUV which does a complete histogram, we can see the errors

[ RUN      ] libyuvTest.TestFullYUV
hist        -5  -4  -3  -2  -1  0   1   2   3
red     0   0   3526    142932  198402  291196  0   0   0
green       0   0   7464    180167  243435  204990  0   0   0
blue        3354    23392   47730   69918   72240   378486  32508   8342    86

Original comment by fbarch...@google.com on 22 Jan 2015 at 8:38

GoogleCodeExporter commented 8 years ago
or with LIBYUV_WIDTH=640

hist        -5  -4  -3  -2  -1  0   1   2   3
red     0   0   89600   3794688 5278976 7613952 0   0   0
green       0   0   191004  4781139 6472567 5332506 0   0   0
blue        78592   610560  1289728 1849088 1944832 9927424 860672  216064  256

Original comment by fbarch...@google.com on 22 Jan 2015 at 9:58

GoogleCodeExporter commented 8 years ago
with improved luma precision and negated chroma coefficients, and ybias to 
center error:

hist            -3      -2      -1      0       1       2       3
red             0       0       1809408 13140736        1827072 0       0
green           0       0       1679912 13471329        1625975 0       0
blue            158464  985344  1872384 10651392        1906688 1017344 185600

Original comment by fbarch...@chromium.org on 23 Jan 2015 at 4:24

GoogleCodeExporter commented 8 years ago
OSX requires a port 

[  FAILED  ] 13 tests, listed below:
[  FAILED  ] libyuvTest.I411ToARGB_Any
[  FAILED  ] libyuvTest.I411ToARGB_Unaligned
[  FAILED  ] libyuvTest.I411ToARGB_Invert
[  FAILED  ] libyuvTest.I411ToARGB_Opt
[  FAILED  ] libyuvTest.I444ToARGB_Unaligned
[  FAILED  ] libyuvTest.I444ToARGB_Invert
[  FAILED  ] libyuvTest.I444ToARGB_Opt
[  FAILED  ] libyuvTest.NV21ToARGB_Any
[  FAILED  ] libyuvTest.YToARGB_Any
[  FAILED  ] libyuvTest.YToARGB_Unaligned
[  FAILED  ] libyuvTest.YToARGB_Invert
[  FAILED  ] libyuvTest.YToARGB_Opt
[  FAILED  ] libyuvTest.YToARGB_Random

13 FAILED TESTS
  YOU HAVE 1 DISABLED TEST

Original comment by fbarch...@google.com on 26 Jan 2015 at 4:25

GoogleCodeExporter commented 8 years ago

Original comment by fbarch...@chromium.org on 27 Jan 2015 at 3:14