myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

I420AlphaToARGB performance #496

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Original was MMX code in chromium.  Replaced with 3 step AVX2 code
LIBYUV_FLAGS=-1 ^CBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*I420AlphaToARGB*

C
libyuvTest.I420AlphaToARGB_Opt (7145 ms)
    69.49%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_C                                                 
    26.57%  libyuv_unittest  libyuv_unittest      [.] ARGBAttenuateRow_C                                              
     3.63%  libyuv_unittest  libyuv_unittest      [.] ARGBCopyYToAlphaRow_C  

SSSE3
I420AlphaToARGB_Opt (1380 ms)
    43.37%  libyuv_unittest  libyuv_unittest    [.] I422ToARGBRow_SSSE3                                                                          
    36.17%  libyuv_unittest  libyuv_unittest    [.] ARGBAttenuateRow_SSSE3                                                                       
    17.71%  libyuv_unittest  libyuv_unittest    [.] ARGBCopyYToAlphaRow_SSE2                                                                     
     0.60%  libyuv_unittest  libyuv_unittest    [.] I420AlphaToARGB             

AVX2
I420AlphaToARGB_Opt (591 ms)
    50.20%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2                                                                                          
    27.94%  libyuv_unittest  libyuv_unittest      [.] ARGBAttenuateRow_AVX2                                                                                       
    17.66%  libyuv_unittest  libyuv_unittest      [.] ARGBCopyYToAlphaRow_AVX2                                                                                    
     0.58%  libyuv_unittest  libyuv_unittest      [.] I420AlphaToARGB              

without alpha
I420ToARGB_Opt (322 ms)

Original issue reported on code.google.com by fbarch...@google.com on 25 Sep 2015 at 1:05

GoogleCodeExporter commented 8 years ago
clang on osx runs out of registers
FAILED: /Volumes/data/b/build/goma/gomacc 
../../third_party/llvm-build/Release+Asserts/bin/clang++ -MMD -MF 
obj/source/libyuv.row_gcc.o.d -DV8_DEPRECATION_WARNINGS 
-D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORE=0 -DCHROMIUM_BUILD 
-DCR_CLANG_REVISION=242792-1 -DUSE_LIBJPEG_TURBO=1 -DENABLE_ONE_CLICK_SIGNIN 
-DENABLE_PRE_SYNC_BACKUP -DENABLE_REMOTING=1 -DENABLE_WEBRTC=1 
-DENABLE_MEDIA_ROUTER=1 -DENABLE_PEPPER_CDMS -DENABLE_CONFIGURATION_POLICY 
-DENABLE_NOTIFICATIONS -DENABLE_HIDPI=1 
-DSYSTEM_NATIVELY_SIGNALS_MEMORY_PRESSURE -DDONT_EMBED_BUILD_METADATA 
-DENABLE_TASK_MANAGER=1 -DENABLE_EXTENSIONS=1 -DENABLE_PLUGIN_INSTALLATION=1 
-DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1 -DENABLE_THEMES=1 
-DENABLE_AUTOFILL_DIALOG=1 -DENABLE_BACKGROUND=1 -DENABLE_GOOGLE_NOW=1 
-DCLD_VERSION=2 -DENABLE_PRINTING=1 -DENABLE_BASIC_PRINTING=1 
-DENABLE_PRINT_PREVIEW=1 -DENABLE_SPELLCHECK=1 -DUSE_PLATFORM_SPELLCHECKER=1 
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SETTINGS_APP=1 
-DENABLE_SUPERVISED_USERS=1 -DENABLE_SERVICE_DISCOVERY=1 
-DENABLE_WIFI_BOOTSTRAPPING=1 -DV8_USE_EXTERNAL_STARTUP_DATA 
-DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL 
-DSAFE_BROWSING_SERVICE -DHAVE_JPEG -DUSE_LIBPCI=1 -DUSE_OPENSSL=1 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DNDEBUG -DNVALGRIND 
-DDYNAMIC_ANNOTATIONS_ENABLED=0 -D_FORTIFY_SOURCE=2 -Igen -I../../include 
-I../.. -I../../chromium/src/third_party/libjpeg_turbo -isysroot 
/Applications/Xcode6.1.app/Contents/Developer/Platforms/MacOSX.platform/Develope
r/SDKs/MacOSX10.6.sdk -O2 -gdwarf-2 -fvisibility=hidden -Werror -Wnewline-eof 
-mmacosx-version-min=10.6 -arch i386 -Wall -Wendif-labels -Wextra 
-Wno-unused-parameter -Wno-missing-field-initializers 
-Wno-selector-type-mismatch -Wpartial-availability -Wheader-hygiene 
-Wno-char-subscripts -Wno-unneeded-internal-declaration 
-Wno-covered-switch-default -Wstring-conversion -Wno-c++11-narrowing 
-Wno-deprecated-register -Wno-inconsistent-missing-override 
-Wno-shift-negative-value -std=c++11 -fno-rtti -fno-exceptions 
-fvisibility-inlines-hidden -fno-threadsafe-statics -Xclang -load -Xclang 
/Volumes/data/b/build/slave/mac32/build/src/third_party/llvm-build/Release+Asser
ts/lib/libFindBadConstructs.dylib -Xclang -add-plugin -Xclang 
find-bad-constructs -Xclang -plugin-arg-find-bad-constructs -Xclang 
check-templates -fcolor-diagnostics -fno-strict-aliasing  -c 
../../source/row_gcc.cc -o obj/source/libyuv.row_gcc.o
../../source/row_gcc.cc:1667:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:1695:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:2085:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:2118:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
4 errors generated.
ninja: build stopped: subcommand failed.

Original comment by fbarch...@google.com on 25 Sep 2015 at 9:43

GoogleCodeExporter commented 8 years ago
C
I420AlphaToARGB_Any (5323 ms)
I420AlphaToARGB_Unaligned (5321 ms)
I420AlphaToARGB_Invert (5332 ms)
I420AlphaToARGB_Opt (5293 ms)
I420AlphaToARGB_Premult (7206 ms)

SSSE3
I420AlphaToARGB_Any (454 ms)
I420AlphaToARGB_Unaligned (425 ms)
I420AlphaToARGB_Invert (411 ms)
I420AlphaToARGB_Opt (416 ms)
I420AlphaToARGB_Premult (730 ms)

AVX2
I420AlphaToARGB_Any (377 ms)
I420AlphaToARGB_Unaligned (329 ms)
I420AlphaToARGB_Invert (324 ms)
I420AlphaToARGB_Opt (323 ms)
I420AlphaToARGB_Premult (483 ms)

Original comment by fbarch...@google.com on 25 Sep 2015 at 11:20