koxiong / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

I420ToBGRA is slow #386

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=0 LIBYUV_REPEAT=10000 
out/Release/libyuv_unittest --gtest_filter=*I420ToBGRA_* | grep ms
[       OK ] libyuvTest.I420ToBGRA_Any (4337 ms)
[       OK ] libyuvTest.I420ToBGRA_Unaligned (4023 ms)
[       OK ] libyuvTest.I420ToBGRA_Invert (3938 ms)
[       OK ] libyuvTest.I420ToBGRA_Opt (3912 ms)
[----------] 4 tests from libyuvTest (16210 ms total)
[==========] 4 tests from 1 test case ran. (16210 ms total)

fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=0 LIBYUV_REPEAT=10000 
out/Release/libyuv_unittest --gtest_filter=*I420ToARGB_* | grep ms
[       OK ] libyuvTest.I420ToARGB_Any (3925 ms)
[       OK ] libyuvTest.I420ToARGB_Unaligned (3764 ms)
[       OK ] libyuvTest.I420ToARGB_Invert (3586 ms)
[       OK ] libyuvTest.I420ToARGB_Opt (3685 ms)
[----------] 4 tests from libyuvTest (14960 ms total)
[==========] 4 tests from 1 test case ran. (14960 ms total)

Original issue reported on code.google.com by fbarch...@google.com on 16 Dec 2014 at 6:47

GoogleCodeExporter commented 8 years ago
Disabling AVX2, these are the SSSE3 performance

fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=1 LIBYUV_REPEAT=10000 
out/Release/libyuv_unittest --gtest_filter=*I420ToBGRA_* | grep ms
[       OK ] libyuvTest.I420ToBGRA_Any (5081 ms)
[       OK ] libyuvTest.I420ToBGRA_Unaligned (5060 ms)
[       OK ] libyuvTest.I420ToBGRA_Invert (4930 ms)
[       OK ] libyuvTest.I420ToBGRA_Opt (5116 ms)
[----------] 4 tests from libyuvTest (20187 ms total)
[==========] 4 tests from 1 test case ran. (20187 ms total)
fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=1 LIBYUV_REPEAT=10000 
out/Release/libyuv_unittest --gtest_filter=*I420ToARGB_* | grep ms
[       OK ] libyuvTest.I420ToARGB_Any (4998 ms)
[       OK ] libyuvTest.I420ToARGB_Unaligned (4957 ms)
[       OK ] libyuvTest.I420ToARGB_Invert (4814 ms)
[       OK ] libyuvTest.I420ToARGB_Opt (4798 ms)
[----------] 4 tests from libyuvTest (19568 ms total)
[==========] 4 tests from 1 test case ran. (19568 ms total)

Original comment by fbarch...@google.com on 16 Dec 2014 at 7:01

GoogleCodeExporter commented 8 years ago
Also old function did 32 pixels but calling code assumed mulytiple of 16.

old 
fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=0 LIBYUV_WIDTH=640 
LIBYUV_HEIGHT=360  LIBYUV_REPEAT=40000 out/Release/libyuv_unittest_old 
--gtest_filter=*I420ToBGRA_Opt | grep ms
[       OK ] libyuvTest.I420ToBGRA_Opt (3759 ms)
[----------] 1 test from libyuvTest (3759 ms total)
[==========] 1 test from 1 test case ran. (3759 ms total)

new
fbarchard-macbookair2:yuv fbarchard$ LIBYUV_DISABLE_AVX2=0 LIBYUV_WIDTH=640 
LIBYUV_HEIGHT=360  LIBYUV_REPEAT=40000 out/Release/libyuv_unittest 
--gtest_filter=*I420ToBGRA_Opt | grep ms
[       OK ] libyuvTest.I420ToBGRA_Opt (3325 ms)
[----------] 1 test from libyuvTest (3325 ms total)
[==========] 1 test from 1 test case ran. (3325 ms total)

Original comment by fbarch...@google.com on 16 Dec 2014 at 11:39

GoogleCodeExporter commented 8 years ago
Fixed in r1207

Original comment by fbarch...@google.com on 16 Dec 2014 at 11:56

GoogleCodeExporter commented 8 years ago

Original comment by fbarch...@google.com on 16 Dec 2014 at 11:56