Closed GoogleCodeExporter closed 8 years ago
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/8af0ebf8166a141e0d8798cfee1ac6b3b9365511
commit 8af0ebf8166a141e0d8798cfee1ac6b3b9365511
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Dec 02 22:20:17 2015
planar blend use signed images
R=dhrosa@google.com, harryjin@google.com, jzern@chromium.org
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1491533002 .
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/README.chromium
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/include/libyuv/row.h
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/include/libyuv/version
.h
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/source/row_common.cc
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/source/row_win.cc
[modify]
http://crrev.com/8af0ebf8166a141e0d8798cfee1ac6b3b9365511/unit_test/planar_test.
cc
Original comment by bugdroid1@chromium.org
on 2 Dec 2015 at 10:21
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/fa2618ee267642719ec51add88d9d60233cf9bfe
commit fa2618ee267642719ec51add88d9d60233cf9bfe
Author: Frank Barchard <fbarchard@google.com>
Date: Fri Dec 04 19:19:41 2015
Port BlendPlaneRow_SSSE3 to GCC
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1490273006 .
[modify]
http://crrev.com/fa2618ee267642719ec51add88d9d60233cf9bfe/include/libyuv/row.h
[modify]
http://crrev.com/fa2618ee267642719ec51add88d9d60233cf9bfe/source/row_gcc.cc
Original comment by bugdroid1@chromium.org
on 4 Dec 2015 at 7:20
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/bea690b3e03d24f77fea45c9a8592ea480a4acd8
commit bea690b3e03d24f77fea45c9a8592ea480a4acd8
Author: Frank Barchard <fbarchard@google.com>
Date: Sun Dec 06 06:23:29 2015
AVX2 YUV alpha blender and improved unittests
AVX2 version can process 16 pixels at a time for improved memory bandwidth and
fewer instructions.
unittests improved to test unaligned memory, and test exactness when alpha is 0
or 255.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1505433002 .
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/README.chromium
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/include/libyuv/planar_
functions.h
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/include/libyuv/row.h
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/include/libyuv/version
.h
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/source/planar_function
s.cc
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/source/row_gcc.cc
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/source/row_win.cc
[modify]
http://crrev.com/bea690b3e03d24f77fea45c9a8592ea480a4acd8/unit_test/planar_test.
cc
Original comment by bugdroid1@chromium.org
on 6 Dec 2015 at 6:23
Support for images with odd height needed. Code will read past the end of the
alpha plane.
source/planar_functions.cc:725: for (y = 0; y < height; ++y) {
e.g., if the height is 5, then the chroma channel has height 3.
First loop will use alpha rows 0 + 1,
next loop will use alpha rows 2 + 3
final loop will use alpha rows 4 + 5, but the alpha channel doesn't have a row
5.
Original comment by fbarch...@chromium.org
on 6 Dec 2015 at 8:51
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/b0b22f88b9f1557dd0a82260ce70748c65e54ad1
commit b0b22f88b9f1557dd0a82260ce70748c65e54ad1
Author: Frank Barchard <fbarchard@google.com>
Date: Mon Dec 07 20:02:45 2015
Unroll C version of YUV blender for improved performance.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1502343003 .
[modify]
http://crrev.com/b0b22f88b9f1557dd0a82260ce70748c65e54ad1/source/row_common.cc
Original comment by bugdroid1@chromium.org
on 7 Dec 2015 at 8:03
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/2657688e701709a5af935e6ea27f4f8967208f2d
commit 2657688e701709a5af935e6ea27f4f8967208f2d
Author: Frank Barchard <fbarchard@google.com>
Date: Mon Dec 07 20:03:20 2015
Add support for odd height YUVA alpha blending.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1507683003 .
[modify]
http://crrev.com/2657688e701709a5af935e6ea27f4f8967208f2d/README.chromium
[modify]
http://crrev.com/2657688e701709a5af935e6ea27f4f8967208f2d/include/libyuv/version
.h
[modify]
http://crrev.com/2657688e701709a5af935e6ea27f4f8967208f2d/source/planar_function
s.cc
[modify]
http://crrev.com/2657688e701709a5af935e6ea27f4f8967208f2d/unit_test/planar_test.
cc
Original comment by bugdroid1@chromium.org
on 7 Dec 2015 at 8:03
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825
commit dee77a4ebeaebc781cb3acd80aa6627fd1c7c825
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Dec 09 02:20:30 2015
Optimize yuv alpha blend AVX2 code to do 32 pixels at time.
out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720
--libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt
Was LibYUVPlanarTest.I420Blend_Opt (2335 ms)
Now LibYUVPlanarTest.I420Blend_Opt (1937 ms)
vs SSSE3
LibYUVPlanarTest.I420Blend_Opt (2599 ms)
BUG=libyuv:527
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1505673003 .
[modify]
http://crrev.com/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825/source/cpu_id.cc
[modify]
http://crrev.com/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825/source/planar_function
s.cc
[modify]
http://crrev.com/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825/source/row_gcc.cc
[modify]
http://crrev.com/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825/source/row_win.cc
Original comment by bugdroid1@chromium.org
on 9 Dec 2015 at 2:21
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/a2ea90567998b1ab93ce7fe3acc25922862e4c9c
commit a2ea90567998b1ab93ce7fe3acc25922862e4c9c
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Dec 09 02:59:48 2015
BlendPlane any width.
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719
--libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)
Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)
which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720
--libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1513443002 .
[modify]
http://crrev.com/a2ea90567998b1ab93ce7fe3acc25922862e4c9c/README.chromium
[modify]
http://crrev.com/a2ea90567998b1ab93ce7fe3acc25922862e4c9c/include/libyuv/version
.h
[modify]
http://crrev.com/a2ea90567998b1ab93ce7fe3acc25922862e4c9c/source/planar_function
s.cc
[modify]
http://crrev.com/a2ea90567998b1ab93ce7fe3acc25922862e4c9c/source/row_any.cc
Original comment by bugdroid1@chromium.org
on 9 Dec 2015 at 3:00
Blend fails on gcc version
[==========] 986 tests from 6 test cases ran. (54351 ms total)
[ PASSED ] 976 tests.
[ FAILED ] 10 tests, listed below:
[ FAILED ] LibYUVPlanarTest.BlendPlaneRow_Opt
[ FAILED ] LibYUVPlanarTest.BlendPlaneRow_Unaligned
[ FAILED ] LibYUVPlanarTest.BlendPlane_Opt
[ FAILED ] LibYUVPlanarTest.BlendPlane_Unaligned
[ FAILED ] LibYUVPlanarTest.BlendPlane_Any
[ FAILED ] LibYUVPlanarTest.BlendPlane_Invert
[ FAILED ] LibYUVPlanarTest.I420Blend_Opt
[ FAILED ] LibYUVPlanarTest.I420Blend_Unaligned
[ FAILED ] LibYUVPlanarTest.I420Blend_Any
[ FAILED ] LibYUVPlanarTest.I420Blend_Invert
10 FAILED TESTS
YOU HAVE 74 DISABLED TESTS
AVX2 version only.
Original comment by fbarch...@chromium.org
on 9 Dec 2015 at 6:35
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/cb44936403fbc72e200ad966ae8e087f30dd535d
commit cb44936403fbc72e200ad966ae8e087f30dd535d
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Dec 09 18:38:46 2015
fix typo in avx2 gcc blend.
was using wrong register on 32 pixel version.
R=harryjin@google.com, dhrosa@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1511433006 .
[modify]
http://crrev.com/cb44936403fbc72e200ad966ae8e087f30dd535d/source/row_gcc.cc
Original comment by bugdroid1@chromium.org
on 9 Dec 2015 at 6:39
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/ae55e418517651548638b27be31d1b2abaed22bb
commit ae55e418517651548638b27be31d1b2abaed22bb
Author: Frank Barchard <fbarchard@google.com>
Date: Tue Dec 15 01:25:36 2015
use rounding in scaledown by 2
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/README.chromium
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/include/libyuv/scale_r
ow.h
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/include/libyuv/version
.h
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/source/planar_function
s.cc
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/source/scale.cc
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/source/scale_any.cc
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/source/scale_gcc.cc
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/source/scale_win.cc
[modify]
http://crrev.com/ae55e418517651548638b27be31d1b2abaed22bb/unit_test/planar_test.
cc
Original comment by bugdroid1@chromium.org
on 15 Dec 2015 at 1:26
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/70445ef2efb4365928ae13e6776b229379517c54
commit 70445ef2efb4365928ae13e6776b229379517c54
Author: Frank Barchard <fbarchard@google.com>
Date: Tue Dec 15 18:59:20 2015
avx2 scale down by 2 for gcc
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1520423003 .
[modify]
http://crrev.com/70445ef2efb4365928ae13e6776b229379517c54/README.chromium
[modify]
http://crrev.com/70445ef2efb4365928ae13e6776b229379517c54/include/libyuv/scale_r
ow.h
[modify]
http://crrev.com/70445ef2efb4365928ae13e6776b229379517c54/include/libyuv/version
.h
[modify]
http://crrev.com/70445ef2efb4365928ae13e6776b229379517c54/source/scale_gcc.cc
[modify]
http://crrev.com/70445ef2efb4365928ae13e6776b229379517c54/unit_test/scale_test.c
c
Original comment by bugdroid1@chromium.org
on 15 Dec 2015 at 6:59
r1555 ports AVX2 scaler to GCC. I420Blend complete.
Followup
Port to Neon.
Make other scalers round consistently.
Combine scale UV and blendplane into 1 step blendUV.
LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 LIBYUV_FLAGS=-1
out/Release/libyuv_unittest --gtest_filter=*Blend* | sed 's/\(.*(\)\([0-9]*\)\(
ms)\)/\2 - \1\2\3/g' | sort -rn | grep ms
517 - [ OK ] LibYUVPlanarTest.ARGBBlend_Unaligned (517 ms)
509 - [ OK ] LibYUVPlanarTest.ARGBBlend_Any (509 ms)
472 - [ OK ] LibYUVPlanarTest.ARGBBlend_Opt (472 ms)
453 - [ OK ] LibYUVPlanarTest.ARGBBlend_Invert (453 ms)
215 - [ OK ] LibYUVPlanarTest.I420Blend_Any (215 ms)
196 - [ OK ] LibYUVPlanarTest.I420Blend_Unaligned (196 ms)
186 - [ OK ] LibYUVPlanarTest.I420Blend_Invert (186 ms)
184 - [ OK ] LibYUVPlanarTest.I420Blend_Opt (184 ms)
132 - [ OK ] LibYUVPlanarTest.BlendPlaneRow_Unaligned (132 ms)
131 - [ OK ] LibYUVPlanarTest.BlendPlane_Unaligned (131 ms)
131 - [ OK ] LibYUVPlanarTest.BlendPlane_Invert (131 ms)
123 - [ OK ] LibYUVPlanarTest.BlendPlaneRow_Opt (123 ms)
121 - [ OK ] LibYUVPlanarTest.BlendPlane_Any (121 ms)
119 - [ OK ] LibYUVPlanarTest.BlendPlane_Opt (119 ms)
2.5x faster than ARGB Blend
Original comment by fbarch...@chromium.org
on 15 Dec 2015 at 7:57
Original issue reported on code.google.com by
fbarch...@google.com
on 1 Dec 2015 at 3:27