myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

row coalesce #197

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When an image has contiguous rows, combine them into one to reduce overhead.
This mainly applies to conversions and effects with one plane.
Benefits are
small performance win, especially on small or narrow images.
ODD widths can use SIMD more efficiently.
less calling overhead for profilers

Original issue reported on code.google.com by fbarch...@chromium.org on 7 Mar 2013 at 9:31

GoogleCodeExporter commented 9 years ago
These files contain functions that potentially benefit from row coalescing:
compare.cc
convert.cc
convert_argb.cc
convert_from.cc
convert_from_argb.cc
planar_functions.cc

Original comment by fbarch...@chromium.org on 10 Mar 2013 at 3:04

GoogleCodeExporter commented 9 years ago
r598 completes these 2 files:
convert_from_argb.cc
planar_functions.cc

Original comment by fbarch...@google.com on 11 Mar 2013 at 9:48

GoogleCodeExporter commented 9 years ago
r601 completes convert_argb.cc

rules for coalescing
vertical subsampling needs to match.
horizontal subsampling needs to be aligned. e.g. width multiple of 2.
if row buffer is used, width * height needs to fit in row buffer kMaxStride.
All rows need to be treated the same (not bayer)

Original comment by fbarch...@chromium.org on 12 Mar 2013 at 9:52

GoogleCodeExporter commented 9 years ago
convert and convert_from only have potential with 420 formats - I420, M420, 
Q420, NV12, NV21.
M420 and Q420 are row planar, so they can not be done.
NV12 and NV21 can be done, but require a refactor of X420 code.

On 2 pass functions like YUY2ToI422, row coalesce potentially is slower/less 
cache friendly than is done a row at a time.

Original comment by fbarch...@chromium.org on 13 Mar 2013 at 6:02

GoogleCodeExporter commented 9 years ago
r605 does convert_from.cc  compare.cc is also complete.

Before
I422ToYUY2_Unaligned (768 ms)
I422ToYUY2_Any (768 ms)
I422ToYUY2_Invert (144 ms)
I422ToYUY2_Opt (141 ms)

After
I422ToYUY2_Unaligned (759 ms)
I422ToYUY2_Any (143 ms)
I422ToYUY2_Invert (141 ms)
I422ToYUY2_Opt (138 ms)

Before
I420ToNV12_Any (119 ms)
I420ToNV12_Invert (113 ms)
I420ToNV12_Opt (105 ms)
I420ToNV12_Unaligned (100 ms)

After
I420ToNV12_Invert (107 ms)
I420ToNV12_Opt (97 ms)
I420ToNV12_Any (97 ms)
I420ToNV12_Unaligned (95 ms)

Original comment by fbarch...@chromium.org on 13 Mar 2013 at 6:14

GoogleCodeExporter commented 9 years ago
Should coalescing check height > 1?
If not, there is risk of infinite recursion.  Shouldnt happen with stride 0.
Some effects functions pass original stride due to x/y lookup.  If new width == 
old stride, loop would occur.
Instead of recurse, variables can be updated and continue, like I420ToNV12 does.

Only 1 file left - convert.  Likely just NV12ToI420

Original comment by fbarch...@chromium.org on 13 Mar 2013 at 6:42

GoogleCodeExporter commented 9 years ago
r607 implements NV12ToI420 and reimplements I420ToNV12, so Y and UV planes are 
coalesced independently.

Original comment by phthor...@gmail.com on 14 Mar 2013 at 7:03

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@chromium.org on 14 Mar 2013 at 7:04