Closed GoogleCodeExporter closed 8 years ago
I just checked with a file encoded with ffmpeg. media produces an output much
closer to the original one than libyuv.
Original comment by mtomasz@chromium.org
on 11 Apr 2014 at 2:53
I'm not able to build chrome (help needed!). Would you be able to provide a way
to test this with libyuv?
It sounds like the nature of the issue is that I420ToARGB does match ffmpeg?
Note that this function produces slightly different results on Neon vs SSSE3.
The C version matches the SSSE3 version. The libyuv unittests allow some
tolerance for functions like this.
Original comment by fbarch...@chromium.org
on 11 Apr 2014 at 10:51
> I'm not able to build chrome (help needed!). Would you be able to provide a
way to test this with libyuv?
Please follow
https://code.google.com/p/chromium/wiki/LinuxFasterBuilds#Use_goma. Let me know
if you have any problems.
> It sounds like the nature of the issue is that I420ToARGB does match ffmpeg?
I think you mean't *doesn't*. I420ToARGB produces high error for an input image
encoded with ffmpeg to ogv. I'm sceptical, if the error is smaller for any
other encoder, so it shouldn't be related to ffmpeg.
> Note that this function produces slightly different results on Neon vs SSSE3.
> The C version matches the SSSE3 version. The libyuv unittests allow some
tolerance
> for functions like this.
The unit tests allow 1 pixel of error. libyuv conversion has an error equal to
2.
Original comment by mtomasz@chromium.org
on 12 Apr 2014 at 1:29
First step needs to be a reproducible error.
Or if you have an idea of what the bug is?
Original comment by fbarch...@google.com
on 30 Apr 2014 at 12:25
I already mentioned it in the bug entry on crbug.com. I think the constants are
wrong, since they do not match the formulas in comments.
Original comment by mtomasz@chromium.org
on 30 Apr 2014 at 12:32
Posting mtomasz@ comments here:
in the fallback C code only:
source/row_common.cc
Snippet:
#define YG 74 /* (int8)(1.164 * 64 + 0.5) */
#define UB 127 /* min(63,(int8)(2.018 * 64)) */
#define UG -25 /* (int8)(-0.391 * 64 - 0.5) */
#define UR 0
#define VB 0
#define VG -52 /* (int8)(-0.813 * 64 - 0.5) */
#define VR 102 /* (int8)(1.596 * 64 + 0.5) */
However, these constants do not match the formulas on the left.
| | real | rounded | libyuv | difference
| YG | 74.996 | 75 | 74 | (75 vs. 74)
| UB*| 129.152 | 129 | 127 | (129 vs. 127, or even 63 vs. 127!)
| UG | -25.524 | -26 | -25 | (-25 vs. -26)
| UR | 0 | 0 | 0 | OK
| VB | 0 | 0 | 0 | OK
| VG | -52.532 | -53 | -52 | (-52 vs. -53)
| VR | 102.644 | 103 | 102 | (103 vs. 102)
* I guess the formula in the comment is wrong.
So, first of all we are flooring instead of rounding. I'm not familiar with the
algorithm, but intuitively, rounding should give more accurate results than
flooring.
Also, if we really want a hard cast over rounding, then the UB value is
significantly wrong.
I've updated these constants to the rounded values, and the accuracy on the
test image increased drastically!
Original comment by fbarch...@chromium.org
on 30 Apr 2014 at 1:00
To test the new jpeg color space J420 I wrote a test to convert from ARGB to
J420 and back to ARGB and measure the error. Its off by a max of 3.
But for I420 its off by 6? Its a compressed color space, so that may explain
some error, but it worth looking into.
I've written a color_test module to test just the color matrix aspect of yuv
to/from rgb.
r1229 tests against a float version of YUV to RGB.
YG = 1.164 * 64 = 74.496 which rounded is 74.
On unsigned values, floor(value + 0.5) was used for rounding.
UBg-25.024
Original comment by fbarch...@chromium.org
on 15 Jan 2015 at 6:36
A new set of tests has isolated 2 issues
Y channel has error due to 2.6 fixed point.
the B channel coefficient is >2 and if clamped to 2 is off.
It may be possible to use 1.7 fixed point, but it will depend on the nuances of
pmaddubsw.
This C code shows 7 bit fixed point values
// C reference code that mimics the YUV assembly.
#define YG 149 /* round(1.164 * 128) */
#define UB 255 /* min(127, round(2.018 * 128)) */
#define UG -50 /* round(-0.391 * 128) */
#define UR 0
#define VB 0
#define VG -104 /* round(-0.813 * 128) */
#define VR 204 /* round(1.596 * 128) */
// Bias
#define BB (UB * 128 + VB * 128 + YG * 16)
#define BG (UG * 128 + VG * 128 + YG * 16)
#define BR (UR * 128 + VR * 128 + YG * 16)
void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
// uint32 y1 = (uint32)round(y * 1.164 * 128);
uint32 y1 = (uint32)(y * YG);
b = Clamp((int32)(u * UB + v * VB + y1 - BB) >> 7);
g = Clamp((int32)(u * UG + v * VG + y1 - BG) >> 7);
r = Clamp((int32)(u * UR + v * VR + y1 - BR) >> 7);
}
and a test shows the tolerances vs float improve:
for (int i = 0; i < 1000; ++i) {
int yr = random() & 255;
int ur = random() & 255;
int vr = random() & 255;
TestYUVToRGBReference(yr, ur, vr, r0, g0, b0);
TestYUVToRGB(yr, ur, vr, r1, g1, b1, benchmark_width_, benchmark_height_);
TestYUVToRGBInt(yr, ur, vr, r2, g2, b2);
EXPECT_NEAR(r0, r1, 3);
EXPECT_NEAR(g0, g1, 3);
EXPECT_NEAR(b0, b1, 5);
EXPECT_NEAR(r0, r2, 1);
EXPECT_NEAR(g0, g2, 1);
EXPECT_NEAR(b0, b2, 4);
}
The other approach would stay with 6 bit fixed point on U and V, which dont
have a problem, but use 2.14 for Y channel. This works out nicely on intel
The current code takes the Y channel and multiplies by 2.6 fixed point (74)
movq xmm3, qword ptr [eax]
lea eax, [eax + 8]
punpcklbw xmm3, xmm4
pmullw xmm3, kYToRgb
paddsw xmm0, xmm3 /* B += Y */
paddsw xmm1, xmm3 /* G += Y */
paddsw xmm2, xmm3 /* R += Y */
psraw xmm0, 6
psraw xmm1, 6
psraw xmm2, 6
packuswb xmm0, xmm0 /* B */
packuswb xmm1, xmm1 /* G */
packuswb xmm2, xmm2 /* R */
The first 4 instructions could be changed to:
movq xmm3, qword ptr [eax]
lea eax, [eax + 8]
punpcklbw xmm3, xmm3
pmulhw xmm3, kYToRgb
saving a register (xmm4) and gaining precision.
downside is it doesnt port well to C or arm.
Original comment by fbarch...@chromium.org
on 17 Jan 2015 at 3:00
7 bits wont work with pmaddubsw.
pmulhw will work, and can be done in C efficiently by including Y replication
in the multiplier constant. e.g. Y * 0x0101 * coefficient
so 6 bits becomes 14 bits accurate
1.164 * 64 * 256 = 19070.976 which rounds to 19071 with small error.
another problem is the bias, so I compute that with the original float:
#define YGB 1192 /* round(1.164 * 64 * 16 ) */
With that the R and G channels are accurate compared to float, with error of 1
at most.
B channel has issue with 2.018 being beyond range of signed 2.6 bit fixed point.
min(127, round(2.018 * 64) = 127.
127/64 = 1.984375
B channel is off by max of 5 with this value.
changing the cofficients to negatives, allows -128 which is 2.0
then subtract the chroma contributions instead of add.
This is proposed replacement code in C
// C prototype code
#define YG 4901241 /* round(1.164 * 64 * 256 * 257) */
#define YGB 1192 /* round(1.164 * 64 * 16 ) */
#define UB -128 /* -min(128, round(2.018 * 64)) */
#define UG 25 /* -round(-0.391 * 64) */
#define UR 0
#define VB 0
#define VG 52 /* -round(-0.813 * 64) */
#define VR -102 /* -round(1.596 * 64) */
// Bias
#define BB (UB * 128 + VB * 128 - YGB)
#define BG (UG * 128 + VG * 128 - YGB)
#define BR (UR * 128 + VR * 128 - YGB)
void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
uint32 y1 = (uint32)(y * YG) >> 16;
b = Clamp((int32)(y1 - (v * VB + u * UB) + BB) >> 6);
g = Clamp((int32)(y1 - (v * VG + u * UG) + BG) >> 6);
r = Clamp((int32)(y1 - (v * VR + u * UR) + BR) >> 6);
}
// C prototype code
#define YG 4901241 /* round(1.164 * 64 * 256 * 257) */
#define YGB 1192 /* round(1.164 * 64 * 16 )*/
#define UB -128 /* -min(128, round(2.018 * 64)) */
#define UG 25 /* -round(-0.391 * 64) */
#define UR 0
#define VB 0
#define VG 52 /* -round(-0.813 * 64) */
#define VR -102 /* -round(1.596 * 64) */
// Bias
#define BB (UB * 128 + VB * 128 - YGB)
#define BG (UG * 128 + VG * 128 - YGB)
#define BR (UR * 128 + VR * 128 - YGB)
void TestYUVToRGBInt(int y, int u, int v, int &r, int &g, int &b) {
// uint32 y1 = (uint32)round(y * 1.164 * 128);
uint32 y1 = (uint32)(y * YG) >> 16;
b = Clamp((int32)(y1 - (v * VB + u * UB) + BB) >> 6);
g = Clamp((int32)(y1 - (v * VG + u * UG) + BG) >> 6);
r = Clamp((int32)(y1 - (v * VR + u * UR) + BR) >> 6);
}
Original comment by fbarch...@chromium.org
on 17 Jan 2015 at 7:17
r1236 fixes luma channel for intel (avx2) and C.
arm code will need an update to match.
Original comment by fbarch...@chromium.org
on 21 Jan 2015 at 1:16
As of r1241 the chroma channel using negative matrix change is in for intel and
C on Windows.
A value of -128 is used, which represents 2.0
The value really wants to be 2.018 which rounds to 129. So its off by 1.
129/64 = 2.015625. So off by 0.015.
But previously it was 127, which is 1.984375.
Blue channel error is now 3 down from 5.
Will consider rounding and centered bias.
rounding is free, but testing showed error of 2 instead of 1 when compared to
float.
centered bias should reduce error on large values and increase error on small
values, for overall less maximum error and better centering of error.
bias on y channel uses float rather than int, but is not centered.
To subtract the negative matrix took 3 extra movdqa instructions and in-order
sequence to subtract the matrix from the bias values instead of add.
2% slower overall.
Accuracy is roughly complete on Windows and can be tested.
todo: posix and arm ports
Original comment by fbarch...@chromium.org
on 21 Jan 2015 at 8:14
syncing the original YUV to RGB
svn up row_posix.cc -r 1235
and using TestFullYUV which does a complete histogram, we can see the errors
[ RUN ] libyuvTest.TestFullYUV
hist -5 -4 -3 -2 -1 0 1 2 3
red 0 0 3526 142932 198402 291196 0 0 0
green 0 0 7464 180167 243435 204990 0 0 0
blue 3354 23392 47730 69918 72240 378486 32508 8342 86
Original comment by fbarch...@google.com
on 22 Jan 2015 at 8:38
or with LIBYUV_WIDTH=640
hist -5 -4 -3 -2 -1 0 1 2 3
red 0 0 89600 3794688 5278976 7613952 0 0 0
green 0 0 191004 4781139 6472567 5332506 0 0 0
blue 78592 610560 1289728 1849088 1944832 9927424 860672 216064 256
Original comment by fbarch...@google.com
on 22 Jan 2015 at 9:58
with improved luma precision and negated chroma coefficients, and ybias to
center error:
hist -3 -2 -1 0 1 2 3
red 0 0 1809408 13140736 1827072 0 0
green 0 0 1679912 13471329 1625975 0 0
blue 158464 985344 1872384 10651392 1906688 1017344 185600
Original comment by fbarch...@chromium.org
on 23 Jan 2015 at 4:24
OSX requires a port
[ FAILED ] 13 tests, listed below:
[ FAILED ] libyuvTest.I411ToARGB_Any
[ FAILED ] libyuvTest.I411ToARGB_Unaligned
[ FAILED ] libyuvTest.I411ToARGB_Invert
[ FAILED ] libyuvTest.I411ToARGB_Opt
[ FAILED ] libyuvTest.I444ToARGB_Unaligned
[ FAILED ] libyuvTest.I444ToARGB_Invert
[ FAILED ] libyuvTest.I444ToARGB_Opt
[ FAILED ] libyuvTest.NV21ToARGB_Any
[ FAILED ] libyuvTest.YToARGB_Any
[ FAILED ] libyuvTest.YToARGB_Unaligned
[ FAILED ] libyuvTest.YToARGB_Invert
[ FAILED ] libyuvTest.YToARGB_Opt
[ FAILED ] libyuvTest.YToARGB_Random
13 FAILED TESTS
YOU HAVE 1 DISABLED TEST
Original comment by fbarch...@google.com
on 26 Jan 2015 at 4:25
Original comment by fbarch...@chromium.org
on 27 Jan 2015 at 3:14
Original issue reported on code.google.com by
mtomasz@chromium.org
on 11 Apr 2014 at 12:51