watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

Requirements/desired changes for switching to libyuv in Firefox #92

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Frank asked for a list of requirements/changes needed to use libyuv for 
YUV->RGB+scaling in Firefox (tracking at 
<https://bugzilla.mozilla.org/show_bug.cgi?id=791941>), so here it is.

(1) libvyuv doesn't, as far as I can tell, support scaling+cropping as a single 
operation. Neither did the original Chromium code. This is 
<https://bugzilla.mozilla.org/show_bug.cgi?id=639415>. As a "temporary" 
work-around (i.e., for the last year and a half), we've been doing YUV->RGB and 
scaling in separate steps whenever there's a cropping rectangle with a non-zero 
offset. Supporting this just requires starting the scaling at a specified 
offset, so conceptually this should be simple to fix.

(2) libyuv is also missing is support for 4:2:2 and 4:4:4 input. I believe 
Chris Double modified the original code to support 4:2:2, since this did not 
require rewriting any of the row-based asm, but of course this does not work 
for 4:4:4. That is <https://bugzilla.mozilla.org/show_bug.cgi?id=640073>.

(3) For mobile devices with screens that run in RGB565 format, it is often 
better to scale directly to RGB565 instead of first scaling to RGB24 and then 
converting to RGB565. We have a set of NEON routines that support this in some 
cases, but they have some limitations 
<https://bugzilla.mozilla.org/show_bug.cgi?id=787886>, only cover some of the 
cases (scale factors near 1.0), and I have not personally had the time to 
maintain them.

(4) We support many ARM devices without NEON (ARMv6 builds are slated to be 
released with Fx 16). It would be nice to get converters that use the ARMv6 
media instructions instead of NEON.

(5) Finally, I'd like to know what rules libyuv applies for handling alignment 
and chroma offsets when scaling. Ideally, we'd like to use "align corners" 
image size convention to avoid introducing shifts as the size changes. From 
<http://code.google.com/p/libyuv/issues/detail?id=86>, it appears that this is 
not what libyuv is doing. We would also prefer to use the JPEG cositing rules 
(chroma centered at the average location of the corresponding luma pixels). 
That is required for Theora, and while VP8 does not define the chroma cositing 
used (last I checked), this matches what's done by basically all non-scaling 
software YUV->RGB conversion code I've ever seen, so it makes sense to use it 
there, too. See the comment at 
<http://mxr.mozilla.org/mozilla-central/source/gfx/ycbcr/ycbcr_to_rgb565.cpp#349
> for details on these two points. Both of these issues should come down to 
correctly computing an initial offset for the first pixel, and so should have 
essentially no impact on speed.

Obviously the level of importance and level of effort for these varies 
significantly. At a minimum, we'd want (2) fixed to avoid a regression. (1) and 
(5) should also be easy, and would let us get rid of a bunch of special cases 
(see GetYCbCrToRGBDestFormatAndSize() at 
<http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#632>). 
This would make going through the effort to switch to libyuv more attractive. 
(3) and (4) are squarely in the "nice to have" category.

Original issue reported on code.google.com by tterr...@webrtc.org on 18 Sep 2012 at 2:01

GoogleCodeExporter commented 9 years ago
Re (2) - 422.  Conversion supports 420, 422, 444, 411, and 400 being converted 
directly (efficiently) to ARGB.

http://code.google.com/p/libyuv/source/search?q=i422toargb&origq=i422toargb&btnG
=Search+Trunk

Original comment by fbarch...@google.com on 18 Sep 2012 at 7:51

GoogleCodeExporter commented 9 years ago
(1) correct.  libyuv could support the 2 step method.  Convert anything to 
ARGB.  Scale ARGB with clipping.

All functions support stride, which can be used to crop, but only to a 2 pixel 
boundary on top/left.

(2) conversions support 420, 422, 444, 400, 411.

(3) conversion supports I420 to RGB565 and ARGB to RGB565.  As the internals 
produce multiple pixels this is hard to natively support efficiently.  A 
compromise would be to produce a row buffer of ARGB and then ARGB to RGB565 
convert.  The ARGB scaler, which turns out to filter more efficiently than YUV 
since its just one plane, could be adapted to scale ARGB source (any 32 bit 
format), and convert to any packed pixel format.
The most similar code to that now is NV12ToRGB565, which achieves a pretty good 
performance by calling several efficient functions row by row.

(4) Tegra2 was the most popular recent arm that lacked Neon.  But Tegra3 has it.
libyuv is used on non-arm/non-x86 hardware as well... mips, sparc, ppc.. its 
just C version.
Priority on Arm is to further optimize the Neon code.
Compiler vendors (including Google who support Android's compiler) should be 
encouraged to produce basic code performance thats not worth hand coding.
The low levels of libyuv are intended to do 16 pixels at a time in general.  
The C versions can at least unroll.  I don't mind tuning the C code to help arm 
produce better code.

(5) you must pass 16 aligned buffers for full performance on most functions.
For scaling with bilinear filter, I center on down sampling, but its upper left 
on upsample (performance concern). I'll fix that to avoid a 1/2 pixel shift.
http://code.google.com/p/libyuv/issues/detail?id=86
And I'll do a specialized 2x upsampler, since this is a common case for spatial 
layers and subsampling.

In GetYCbCrToRGBDestFormatAndSize I see a few other functions libyuv could help 
with.
ARGB swizzling, preattenuate/unattenuate.
http://code.google.com/p/libyuv/source/search?q=ARGBAttenuate&origq=ARGBAttenuat
e&btnG=Search+Trunk

Original comment by fbarch...@google.com on 19 Sep 2012 at 12:18

GoogleCodeExporter commented 9 years ago
(2) I've added I422ToARGB, I422ToBGRA, I422ToRGBA, I422ToABGR.

(3) For 565, the first step will be faster conversions - 1 step NEON.  
Currently its 2 steps:
NEON+C: I420ToARGB_NEON and ARGBToRGB565_C
I420ToRGB565_OptVsC (6020 ms)
C+C: I420ToARGB_C and ARGBToRGB565_C
I420ToRGB565_OptVsC (31305 ms)
That 6 ms for 720p.  Not very good. Converting to ARGB is fully Neon and better:
I420ToARGB_OptVsC (2721 ms)
Looking at your scaler to 565, you've got dithering, which I didn't plan to 
do... doesn't lend itself to Neon.

Original comment by fbarch...@google.com on 8 Oct 2012 at 3:26

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@google.com on 8 Oct 2012 at 10:27

GoogleCodeExporter commented 9 years ago
Conversion I420ToRGB565 and NV12ToRGB565 optimized for Neon.  (and SSSE3)

Original comment by fbarch...@chromium.org on 2 Nov 2012 at 6:52

GoogleCodeExporter commented 9 years ago
Quick recap of issues re yuv to rgb scaling

(1) libvyuv doesn't, as far as I can tell, support scaling+cropping as a single 
operation. 
Done

(2) libyuv is also missing is support for 4:2:2 and 4:4:4 input. 
Not started, but increasingly important, and doable.

(3) For mobile devices with screens that run in RGB565 format.
Efficient ARGBToRGB565 conversion implemented for Neon and SSSE3, but not 
integrated into scaler.

(4) We support many ARM devices without NEON
Although we have users that go back to Armv5, there are no plans to optimize 
for anything but Neon and in future, armv8.  Older Cpus are supported, but not 
optimized for,

(5) Finally, I'd like to know what rules libyuv applies for handling alignment 
and chroma offsets when scaling.
Work in progress.
More functions support unaligned pointers.
Intent is currently symmetry - up and down sample should be orthogonal.

Original comment by fbarch...@google.com on 13 Oct 2013 at 2:59

GoogleCodeExporter commented 9 years ago
no further changes for now.  But open specific bugs if there is a request.

Original comment by fbarch...@google.com on 9 Feb 2015 at 7:03