Closed GoogleCodeExporter closed 9 years ago
Running perf shows its spending too much time filtering the rows.
Events: 9K cycles
79.39% libyuv_unittest libyuv_unittest [.] ARGBInterpolateRow_SSSE3
9.94% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
5.44% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
1.37% libyuv_unittest libyuv_unittest [.] ScaleARGBBilinearDown
1.19% libyuv_unittest libyuv_unittest [.] ScaleARGB
1.03% libyuv_unittest libc-2.15.so [.] getenv
0.79% libyuv_unittest libc-2.15.so [.] __strncmp_sse2
0.19% libyuv_unittest libyuv_unittest [.] ARGBScaleClip
0.18% libyuv_unittest libc-2.15.so [.] __random_r
0.17% libyuv_unittest libyuv_unittest [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.07% libyuv_unittest libc-2.15.so [.] __random
0.06% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff8103b51a
0.06% libyuv_unittest libc-2.15.so [.] __memset_sse2
0.04% libyuv_unittest libc-2.15.so [.] __strlen_sse2
0.03% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_C
0.02% libyuv_unittest libyuv_unittest [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.02% libyuv_unittest ld-2.15.so [.] _dl_relocate_object
0.01% libyuv_unittest libyuv_unittest [.] random@plt
0.01% libyuv_unittest libyuv_unittest [.] ARGBScale
Original intent was to clip the source region
On Windows, before:
d:\src\libyuv2\trunk>out\release\libyuv_unittest
--gtest_filter=*ARGBScale*DownBy34_Bilinear | sed "s/\(.*(\)\([0-9]*\)\(
ms)\)/\2 - \1\2\3/
g" | c:\cygwin\bin\sort -rn | grep ms
11842 - [ OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (11842 ms)
726 - [ OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (726 ms)
[==========] 2 tests from 1 test case ran. (12568 ms total)
[----------] 2 tests from libyuvTest (12568 ms total)
After
d:\src\libyuv\trunk>out\release\libyuv_unittest
--gtest_filter=*ARGBScale*DownBy34_Bilinear | sed "s/\(.*(\)\([0-9]*\)\(
ms)\)/\2 - \1\2\3/g
" | c:\cygwin\bin\sort -rn | grep ms
5293 - [ FAILED ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (5293 ms)
794 - [ OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (794 ms)
[==========] 2 tests from 1 test case ran. (6087 ms total)
Bug causing failure needs to be resolved. 2x faster, but still slow.
Original comment by fbarch...@chromium.org
on 17 May 2013 at 3:52
r696 removes getenv
Before
11842 - [ OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (11842 ms)
726 - [ OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (726 ms)
After
8540 - [ OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (8540 ms)
716 - [ OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (716 ms)
Events: 9K cycles
81.53% libyuv_unittest libyuv_unittest [.] ARGBInterpolateRow_SSSE3
9.85% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
5.39% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
1.16% libyuv_unittest libyuv_unittest [.] ScaleARGB
1.09% libyuv_unittest libyuv_unittest [.] ScaleARGBBilinearDown
0.29% libyuv_unittest libyuv_unittest [.] ARGBScaleClip
0.15% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff8103b51a
0.14% libyuv_unittest libyuv_unittest [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.14% libyuv_unittest libc-2.15.so [.] __random_r
0.12% libyuv_unittest libc-2.15.so [.] __random
0.05% libyuv_unittest libc-2.15.so [.] __memset_sse2
0.04% libyuv_unittest libyuv_unittest [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.02% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_C
0.01% libyuv_unittest libyuv_unittest [.] random@plt
0.01% libyuv_unittest libc-2.15.so [.] _int_malloc
Original comment by fbarch...@chromium.org
on 17 May 2013 at 9:20
Events: 2K cycles
35.28% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
29.69% libyuv_unittest libyuv_unittest [.] ARGBInterpolateRow_SSSE3
21.92% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
4.80% libyuv_unittest libyuv_unittest [.] ScaleARGB
4.02% libyuv_unittest libyuv_unittest [.] ScaleARGBBilinearDown
1.31% libyuv_unittest libyuv_unittest [.] ARGBScaleClip
0.88% libyuv_unittest libc-2.15.so [.] __random_r
0.74% libyuv_unittest libyuv_unittest [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.62% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff8103b51a
0.22% libyuv_unittest libc-2.15.so [.] __random
0.13% libyuv_unittest libyuv_unittest [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.09% libyuv_unittest libc-2.15.so [.] __memset_sse2
0.09% libyuv_unittest libyuv_unittest [.] random@plt
0.09% libyuv_unittest libyuv_unittest [.] ARGBInterpolateRow_C
0.04% libyuv_unittest ld-2.15.so [.] do_lookup_x
0.04% libyuv_unittest libyuv_unittest [.] testing::internal::UnitTestOptions::PatternMatchesString(char const*, char const*)
0.04% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_C
Original comment by fbarch...@chromium.org
on 17 May 2013 at 11:56
Overall 8.8x faster clipping
Was ARGBScaleClipDownBy34_Bilinear (11842 ms)
Now ARGBScaleClipDownBy34_Bilinear (1341 ms)
Original comment by fbarch...@chromium.org
on 19 May 2013 at 7:14
Original issue reported on code.google.com by
fbarch...@google.com
on 16 May 2013 at 8:00