Closed GoogleCodeExporter closed 9 years ago
Thanks. Most functions are optimized for the aligned case, as it helps Core
and Atom performance, and theres not much good reason to allocate memory
unaligned.
ARGB is often used for screencasting however, so odd sizes are more common.
Is this coming up in practice or just something noticed in the tests?
The downside of doing it is the additional alignment checks slightly hurt
performance of the aligned case.
Prefer do all ARGB functions at once.
Off top of my head, there are 4 functions for completeness of ARGB to/from
BGRA,ABGR and RGBA. They are
ARGBToBGRA
ARGBToABGR
ARGBToRGBA
RGBAToARGB
And there are posix versions. Overall there are 9 core RGB formats.
RGB24
RAW
RGB565
ARGB1555
ARGB4444
Once a function has Unaligned, it makes sense to do Any variations to handle
odd widths. Odd widths may have aligned rows via stride, but typically they
dont.
Prefer add row coalescing at the same time. If width=stride, you can treat it
as a single row, which tends to be aligned.
Prefer do AVX2 version of these, which has free unaligned access.
The code for all of these are identical, aside from the constant. Seems like a
common function or macro would help.
Original comment by fbarch...@chromium.org
on 7 Mar 2013 at 7:54
Done in r595.
Before
>out\release\libyuv_unittest --gtest_filter=*ABGRToARGB* | sed
"s/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g" |
c:\cygwin\bin\sort -rn | grep ms
1881 - [ OK ] libyuvTest.ABGRToARGB_Unaligned (1881 ms)
293 - [ OK ] libyuvTest.ABGRToARGB_Any (293 ms)
290 - [ OK ] libyuvTest.ABGRToARGB_Invert (290 ms)
289 - [ OK ] libyuvTest.ABGRToARGB_Opt (289 ms)
281 - [ OK ] libyuvTest.ABGRToARGB_Random (281 ms)
[==========] 5 tests from 1 test case ran. (3034 ms total)
After
>out\release\libyuv_unittest --gtest_filter=*ABGRToARGB* | sed
"s/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g" |
c:\cygwin\bin\sort -rn | grep ms
306 - [ OK ] libyuvTest.ABGRToARGB_Unaligned (306 ms)
295 - [ OK ] libyuvTest.ABGRToARGB_Invert (295 ms)
291 - [ OK ] libyuvTest.ABGRToARGB_Any (291 ms)
290 - [ OK ] libyuvTest.ABGRToARGB_Opt (290 ms)
273 - [ OK ] libyuvTest.ABGRToARGB_Random (273 ms)
[==========] 5 tests from 1 test case ran. (1455 ms total)
Original comment by fbarch...@chromium.org
on 8 Mar 2013 at 1:40
Thanks for the quick merge.
The intention for doing this is from the test since ARGB/ABGR unaligned cases
are identified far slower than the others.
AVX2 would be the next step.
Original comment by changjun...@intel.com
on 8 Mar 2013 at 1:53
I've written a more general ARGBShuffler for AVX2
https://webrtc-codereview.appspot.com/1171006
Original comment by fbarch...@chromium.org
on 8 Mar 2013 at 3:51
Fixed in r596.
Rewrote BGRAToARGB, ABGRToARGB, RGBAToARGB and ARGBToRGBA to use ARGBShuffle -
less code, more variations.
Added AVX2
Added Unaligned_SSSE3
Any variations for SSSE3, AVX2 and Neon
Row coalescing - treat as width * height, 1 for contiguous rows.
Unrolled to do 2 at a time.
Sandy Bridge performance
BGRAToARGB_Any (272 ms)
BGRAToARGB_Unaligned (281 ms)
BGRAToARGB_Invert (283 ms)
BGRAToARGB_Opt (272 ms)
BGRAToARGB_Random (279 ms)
ABGRToARGB_Any (264 ms)
ABGRToARGB_Unaligned (268 ms)
ABGRToARGB_Invert (265 ms)
ABGRToARGB_Opt (252 ms)
ABGRToARGB_Random (264 ms)
RGBAToARGB_Any (268 ms)
RGBAToARGB_Unaligned (280 ms)
RGBAToARGB_Invert (275 ms)
RGBAToARGB_Opt (260 ms)
RGBAToARGB_Random (257 ms)
ARGBToRGBA_Any (258 ms)
ARGBToRGBA_Unaligned (265 ms)
ARGBToRGBA_Invert (267 ms)
ARGBToRGBA_Opt (260 ms)
ARGBToRGBA_Random (265 ms)
Original comment by fbarch...@chromium.org
on 8 Mar 2013 at 11:37
doh! ios error:
row_neon.cc:1190:3: error: expected string literal
: "cc", "memory", "q0", "d2" // Clobber List
^
1 error generated.
Original comment by fbarch...@chromium.org
on 9 Mar 2013 at 12:27
fixed in r597
It was a , ARGBToBayer, which previously declared the shuffler wrong.
Original comment by fbarch...@chromium.org
on 9 Mar 2013 at 12:33
Original issue reported on code.google.com by
changjun...@intel.com
on 6 Mar 2013 at 9:01Attachments: