Closed GoogleCodeExporter closed 9 years ago
Sweet! Could you do I420ToARGB?
Original comment by fbarch...@chromium.org
on 5 Feb 2013 at 9:32
Sure. I would try that one if this patch looks well.
Original comment by changjun...@intel.com
on 6 Feb 2013 at 1:31
Put up for review. Its generally good in form, but minor changes preferred
use macro for ifdef, not compiler version.
conditionally define macro.
unconditionally prototype
https://webrtc-codereview.appspot.com/1090005
types should be lower case
Original comment by fbarch...@chromium.org
on 6 Feb 2013 at 9:49
In the code review I have some changes and questions, if you dont mind.
For SSSE3 version, because you didn't clear the upper vectors, your speeds may
be wrong.
I get:
d:\src\libyuv\trunk>out\release\libyuv_unittest --gtest_filter=*ARGBToI420* |
sed "s/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g" |
c:\cygwin\bin\sort -rn | grep ms
424 - [ OK ] libyuvTest.ARGBToI420_Unaligned (424 ms)
406 - [ OK ] libyuvTest.ARGBToI420_Any (406 ms)
383 - [ OK ] libyuvTest.ARGBToI420_Opt (383 ms)
380 - [ OK ] libyuvTest.ARGBToI420_Invert (380 ms)
[==========] 4 tests from 1 test case ran. (1593 ms total)
On an HP Z620 which E5-2690 (Sandy Bridge). So I'd hope the Haswell can beat
that.
Original comment by fbarch...@chromium.org
on 6 Feb 2013 at 10:23
before
d:\src\libyuv\trunk>more noavx.txt
[ OK ] libyuvTest.ARGBToI420_Any (515 ms)
[ OK ] libyuvTest.ARGBToI420_Unaligned (530 ms)
[ OK ] libyuvTest.ARGBToI420_Invert (499 ms)
[ OK ] libyuvTest.ARGBToI420_Opt (500 ms)
[----------] 4 tests from libyuvTest (2044 ms total)
after
[ OK ] libyuvTest.ARGBToI420_Any (468 ms)
[ OK ] libyuvTest.ARGBToI420_Unaligned (421 ms)
[ OK ] libyuvTest.ARGBToI420_Invert (406 ms)
[ OK ] libyuvTest.ARGBToI420_Opt (405 ms)
[----------] 4 tests from libyuvTest (1700 ms total)
20% faster overall. Not a big win?
Original comment by fbarch...@chromium.org
on 7 Feb 2013 at 7:01
try bots say android breaks with this patch. needs a little more work.
Original comment by fbarch...@chromium.org
on 8 Feb 2013 at 6:19
r566 checks in the initial code.
ARGBToY is complete. ARGBToUV needs more work.
Original comment by fbarch...@chromium.org
on 8 Feb 2013 at 11:05
r567 removes vmovdqa from ARGBToUV.
Original comment by fbarch...@chromium.org
on 8 Feb 2013 at 11:27
r575 removes excess vpermq's. 5% faster.
SSSE3 4212 ms
AVX2 2964 ms
Original comment by fbarch...@google.com
on 15 Feb 2013 at 6:59
Original issue reported on code.google.com by
changjun...@intel.com
on 5 Feb 2013 at 7:13Attachments: