zzxxpp1011239740 / webp

Automatically exported from code.google.com/p/webp
0 stars 0 forks source link

[PATCH] NEON optimised yuv to rgb conversion #134

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The attached patch adds NEON optimised yuv to rgb conversion along the lines of 
the SSE chroma upsampling. Total speedup is ~35%.

Original issue reported on code.google.com by mrullgard@gmail.com on 17 Jan 2013 at 9:30

Attachments:

GoogleCodeExporter commented 9 years ago
great, thanks Mans!

attached, a slightly modified version of the patch

(added some declaration in dsp.h, updated makefile.unix)

Original comment by pascal.m...@gmail.com on 17 Jan 2013 at 10:55

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for fixing that.

Original comment by mrullgard@gmail.com on 18 Jan 2013 at 1:16

GoogleCodeExporter commented 9 years ago
https://gerrit.chromium.org/gerrit/41610

I'm currently getting different output for -ppm with -noasm. Will look into it 
tomorrow.

James, you can assign this to me.

Original comment by johannko...@google.com on 18 Jan 2013 at 2:18

GoogleCodeExporter commented 9 years ago

Original comment by jz...@google.com on 18 Jan 2013 at 3:30

GoogleCodeExporter commented 9 years ago
Yes, there are a few off-by-ones in the colourspace conversion caused by 
rounding differences compared to the lookup tables the C code uses.  Neither is 
more correct than the other.

Original comment by mrullgard@gmail.com on 18 Jan 2013 at 12:13

GoogleCodeExporter commented 9 years ago
Are you referring to yuv2r() and the likes?
If so, the off-by-one seems easily fixable.
It's important to have bit-wise the same output for all platforms, so we can 
track (real) bugs easily.

Original comment by pascal.m...@gmail.com on 18 Jan 2013 at 12:20

GoogleCodeExporter commented 9 years ago
How are the constants used for the table calculations in yuv.c derived?

Original comment by mrullgard@gmail.com on 18 Jan 2013 at 1:11

GoogleCodeExporter commented 9 years ago
the constants correspond to the BT-601 conversion:

R = 1.164(Y - 16) + 1.596(V - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
B = 1.164(Y - 16)                   + 2.018(U - 128)

(http://www.fourcc.org/fccyvrgb.php)

But since we use an offset in the lookup table VP8kClip[] to take care of the 
1.164 * (Y-16) common term, the constants 1.596, -0.813, -0.391 and 2.018 are 
pre-divided by 1.164 before being turned into fix-point 16bit constants.

Original comment by pascal.m...@gmail.com on 18 Jan 2013 at 1:45

GoogleCodeExporter commented 9 years ago
added some comments in the code here
  =>  https://gerrit.chromium.org/gerrit/41634

Original comment by pascal.m...@gmail.com on 18 Jan 2013 at 2:09

GoogleCodeExporter commented 9 years ago
Here's a version that matches the C code exactly. It's about 4% slower than the 
original patch.

Original comment by mrullgard@gmail.com on 19 Jan 2013 at 3:34

Attachments:

GoogleCodeExporter commented 9 years ago
great! upload a new patch on https://gerrit.chromium.org/gerrit/#/c/41610/
There were some leftovers in upsampling_neon.c (CY, CVR, ..., coef[], 
cf16,cf32,u16,u128) that i removed.

Maybe the x86 version could be made faster without tables, similarly to the ARM 
one.
Need to investigate later. For now, having the same bitwise output on all 
platform is preferable.

Original comment by pascal.m...@gmail.com on 21 Jan 2013 at 3:57

GoogleCodeExporter commented 9 years ago
Those are very much used. Your modified patch will not compile.

Original comment by mrullgard@gmail.com on 21 Jan 2013 at 4:18

GoogleCodeExporter commented 9 years ago
ah! my bad. Fixed. I'll let Johann try it...

Original comment by pascal.m...@gmail.com on 21 Jan 2013 at 4:37

GoogleCodeExporter commented 9 years ago
This was merged, thanks Mans.

1de3e25 Merge "NEON optimised yuv to rgb conversion"
090b708 NEON optimised yuv to rgb conversion

Original comment by jz...@google.com on 7 Feb 2013 at 8:19