Closed GoogleCodeExporter closed 9 years ago
r797 adds gcc 4.7 AVX2 Polynomial
Original comment by fbarch...@google.com
on 24 Sep 2013 at 1:21
ARGBShuffle ported to gcc.
Original comment by fbarch...@google.com
on 1 Oct 2013 at 3:57
r1035 makes a fix for AVX2 detection
Original comment by fbarch...@chromium.org
on 15 Jul 2014 at 12:50
r1122 changed #elif's to #endif/#if so AVX2 can be added. But introduced a
build error:
[66/267 | 7.503] LINK genmacro, POSTBUILDS
FAILED: /Volumes/data/b/build/goma/gomacc
../../third_party/llvm-build/Release+Asserts/bin/clang++ -MMD -MF
obj/source/libyuv.rotate.o.d -DV8_DEPRECATION_WARNINGS
-D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORE=0 -DCHROMIUM_BUILD
-DCR_CLANG_REVISION=217949 -DCOMPONENT_BUILD -DUSE_LIBJPEG_TURBO=1
-DENABLE_ONE_CLICK_SIGNIN -DENABLE_PRE_SYNC_BACKUP -DENABLE_REMOTING=1
-DENABLE_WEBRTC=1 -DENABLE_PEPPER_CDMS -DENABLE_CONFIGURATION_POLICY
-DENABLE_NOTIFICATIONS -DENABLE_HIDPI=1
-DDISCARDABLE_MEMORY_ALWAYS_SUPPORTED_NATIVELY
-DSYSTEM_NATIVELY_SIGNALS_MEMORY_PRESSURE -DDCHECK_ALWAYS_ON=1
-DENABLE_EGLIMAGE=1 -DENABLE_TASK_MANAGER=1 -DENABLE_EXTENSIONS=1
-DENABLE_PLUGIN_INSTALLATION=1 -DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1
-DENABLE_THEMES=1 -DENABLE_AUTOFILL_DIALOG=1 -DENABLE_BACKGROUND=1
-DENABLE_GOOGLE_NOW=1 -DCLD_VERSION=2 -DCLD2_DATA_SOURCE=static
-DENABLE_FULL_PRINTING=1 -DENABLE_PRINTING=1 -DENABLE_SPELLCHECK=1
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SETTINGS_APP=1
-DENABLE_MANAGED_USERS=1 -DENABLE_SERVICE_DISCOVERY=1
-DENABLE_WIFI_BOOTSTRAPPING=1 -DENABLE_LOAD_COMPLETION_HACKS=1 -DHAVE_JPEG
-DUSE_OPENSSL=1 -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DWTF_USE_DYNAMIC_ANNOTATIONS=1
-Igen -I../../include -I../.. -I../../chromium/src/third_party/libjpeg_turbo
-isysroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/S
DKs/MacOSX10.6.sdk -O0 -fvisibility=hidden -Werror -Wnewline-eof
-mmacosx-version-min=10.6 -arch i386 -Wendif-labels -Wno-unused-parameter
-Wno-missing-field-initializers -Wno-selector-type-mismatch -Wheader-hygiene
-Wno-char-subscripts -Wno-unneeded-internal-declaration
-Wno-covered-switch-default -Wstring-conversion -Wno-c++11-narrowing
-Wno-deprecated-register -Wno-unused-local-typedef -std=gnu++11 -fno-rtti
-fno-exceptions -fvisibility-inlines-hidden -fno-threadsafe-statics -Xclang
-load -Xclang
/Volumes/data/b/build/slave/mac32/build/src/third_party/llvm-build/Release+Asser
ts/lib/libFindBadConstructs.dylib -Xclang -add-plugin -Xclang
find-bad-constructs -fcolor-diagnostics -fno-strict-aliasing
-fstack-protector-all -Wno-undefined-bool-conversion
-Wno-tautological-undefined-compare -c ../../source/rotate.cc -o
obj/source/libyuv.rotate.o
../../source/rotate.cc:38:9: error: 'DECLARE_FUNCTION' macro redefined
[-Werror,-Wmacro-redefined]
#define DECLARE_FUNCTION(name) \
^
../../source/rotate.cc:26:9: note: previous definition is here
#define DECLARE_FUNCTION(name) \
^
1 error generated.
ninja: build stopped: subcommand failed.
/Volumes/data/b/build/goma/goma_ctl.sh stat
Original comment by fbarch...@google.com
on 16 Oct 2014 at 9:21
r1127 ports I420ToBGRA to AVX2 for Windows.
Original comment by fbarch...@google.com
on 20 Oct 2014 at 9:28
r1131 ports I420ToBGRA to gcc
SSSE3 480.2 ms
AVX2 385.5 ms
Original comment by fbarch...@google.com
on 21 Oct 2014 at 4:44
Estimation of completeness. NEON considered complete
NEON
findstr Row_NEON *.h | wc -l
86 functions
SSE2/SSSE3...
findstr Row_SS *.h | wc -l
90 functions. Some are duplicate..sse2 + ssse3
AVX2
findstr Row_AVX2 *.h | wc -l
30
30/86 = 34.8%
findstr Row_AVX2.*\( *_win.cc | wc -l
28
findstr Row_AVX2.*\( *_posix.cc | wc -l
5
Original comment by fbarch...@google.com
on 29 Oct 2014 at 6:46
On OSX these are the slowest 'Opt' functions:
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000
out/Release/libyuv_unittest --gtest_filter=**Opt | sed 's/\(.*(\)\([0-9]*\)\(
ms)\)/\2 - \1\2\3/g' | sort -rn | grep ms
3567 - [ OK ] libyuvTest.TestFixedDiv1_Opt (3567 ms)
3028 - [ OK ] libyuvTest.TestFixedDiv_Opt (3028 ms)
2429 - [ OK ] libyuvTest.ARGBBlur_Opt (2429 ms)
1630 - [ OK ] libyuvTest.BayerBGGRToI420_Opt (1630 ms)
1627 - [ OK ] libyuvTest.BayerGRBGToI420_Opt (1627 ms)
1592 - [ OK ] libyuvTest.BayerRGGBToI420_Opt (1592 ms)
1582 - [ OK ] libyuvTest.BayerGBRGToI420_Opt (1582 ms)
1510 - [ OK ] libyuvTest.ARGBBlurSmall_Opt (1510 ms)
1378 - [ OK ] libyuvTest.BayerGRBGToARGB_Opt (1378 ms)
1378 - [ OK ] libyuvTest.BayerBGGRToARGB_Opt (1378 ms)
1337 - [ OK ] libyuvTest.BayerGBRGToARGB_Opt (1337 ms)
1336 - [ OK ] libyuvTest.BayerRGGBToARGB_Opt (1336 ms)
1168 - [ OK ] libyuvTest.ARGBToI411_Opt (1168 ms)
1099 - [ OK ] libyuvTest.I420ToI444_Opt (1099 ms)
976 - [ OK ] libyuvTest.I420ToARGB1555_Opt (976 ms)
928 - [ OK ] libyuvTest.I420ToRGB565_Opt (928 ms)
888 - [ OK ] libyuvTest.NV12ToRGB565_Opt (888 ms)
886 - [ OK ] libyuvTest.NV21ToRGB565_Opt (886 ms)
873 - [ OK ] libyuvTest.ARGBSobel_Opt (873 ms)
Original comment by fbarch...@chromium.org
on 10 Nov 2014 at 6:57
The initial port is complete but not passing unittests. Installing the Intel
SDE emulator:
cd ia32
udo chgrp procmod pinbin
chmod g+s pinbin
cd ../intel64
udo chgrp procmod pinbin
chmod g+s pinbin
.../sde-external-7.8.0-2014-10-02-mac/sde -ast -hsw --
out/Release/libyuv_unittest
[----------] Global test environment tear-down
[==========] 887 tests from 1 test case ran. (79889 ms total)
[ PASSED ] 833 tests.
[ FAILED ] 54 tests, listed below:
[ FAILED ] libyuvTest.I420ToARGB_Any
[ FAILED ] libyuvTest.I420ToARGB_Unaligned
[ FAILED ] libyuvTest.I420ToARGB_Invert
[ FAILED ] libyuvTest.I420ToARGB_Opt
[ FAILED ] libyuvTest.I422ToARGB_Any
[ FAILED ] libyuvTest.I422ToARGB_Unaligned
[ FAILED ] libyuvTest.I422ToARGB_Invert
[ FAILED ] libyuvTest.I422ToARGB_Opt
[ FAILED ] libyuvTest.I420ToBayerBGGR_Any
[ FAILED ] libyuvTest.I420ToBayerBGGR_Unaligned
[ FAILED ] libyuvTest.I420ToBayerBGGR_Invert
[ FAILED ] libyuvTest.I420ToBayerBGGR_Opt
[ FAILED ] libyuvTest.I420ToBayerRGGB_Any
[ FAILED ] libyuvTest.I420ToBayerRGGB_Unaligned
[ FAILED ] libyuvTest.I420ToBayerRGGB_Invert
[ FAILED ] libyuvTest.I420ToBayerRGGB_Opt
[ FAILED ] libyuvTest.I420ToBayerGBRG_Any
[ FAILED ] libyuvTest.I420ToBayerGBRG_Unaligned
[ FAILED ] libyuvTest.I420ToBayerGBRG_Invert
[ FAILED ] libyuvTest.I420ToBayerGBRG_Opt
[ FAILED ] libyuvTest.I420ToBayerGRBG_Any
[ FAILED ] libyuvTest.I420ToBayerGRBG_Unaligned
[ FAILED ] libyuvTest.I420ToBayerGRBG_Invert
[ FAILED ] libyuvTest.I420ToBayerGRBG_Opt
[ FAILED ] libyuvTest.ARGBToI420_Any
[ FAILED ] libyuvTest.ARGBToI420_Unaligned
[ FAILED ] libyuvTest.ARGBToI420_Invert
[ FAILED ] libyuvTest.ARGBToI420_Opt
[ FAILED ] libyuvTest.ARGBToI411_Any
[ FAILED ] libyuvTest.ARGBToI411_Unaligned
[ FAILED ] libyuvTest.ARGBToI411_Invert
[ FAILED ] libyuvTest.ARGBToI411_Opt
[ FAILED ] libyuvTest.UYVYToI422_Any
[ FAILED ] libyuvTest.UYVYToI422_Unaligned
[ FAILED ] libyuvTest.UYVYToI422_Invert
[ FAILED ] libyuvTest.UYVYToI422_Opt
[ FAILED ] libyuvTest.ARGBToI400_Any
[ FAILED ] libyuvTest.ARGBToI400_Unaligned
[ FAILED ] libyuvTest.ARGBToI400_Invert
[ FAILED ] libyuvTest.ARGBToI400_Opt
[ FAILED ] libyuvTest.ARGBToI400_Random
[ FAILED ] libyuvTest.ARGBToJ400_Any
[ FAILED ] libyuvTest.ARGBToJ400_Unaligned
[ FAILED ] libyuvTest.ARGBToJ400_Invert
[ FAILED ] libyuvTest.ARGBToJ400_Opt
[ FAILED ] libyuvTest.ARGBToJ400_Random
[ FAILED ] libyuvTest.ARGBToARGBMirror_Any
[ FAILED ] libyuvTest.ARGBToARGBMirror_Unaligned
[ FAILED ] libyuvTest.ARGBToARGBMirror_Invert
[ FAILED ] libyuvTest.ARGBToARGBMirror_Opt
[ FAILED ] libyuvTest.ARGBToARGBMirror_Random
[ FAILED ] libyuvTest.TestARGBMirror
[ FAILED ] libyuvTest.ARGBRotate180
[ FAILED ] libyuvTest.ARGBRotate180_Odd
54 FAILED TESTS
YOU HAVE 1 DISABLED TEST
Original comment by phthor...@gmail.com
on 12 Dec 2014 at 5:54
r1195 disables the affected AVX2 functions. All tests pass
Original comment by fbarch...@google.com
on 12 Dec 2014 at 7:32
Fixed in r1207
All Windows functions are ported to GCC / NaCL.
Original comment by fbarch...@google.com
on 17 Dec 2014 at 12:08
Original comment by fbarch...@google.com
on 17 Dec 2014 at 12:08
Original issue reported on code.google.com by
fbarch...@google.com
on 13 Sep 2013 at 10:54