watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

ARGBScaleClip is slow #228

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
fbarchard@g36:/usr/local/google/libyuv/trunk$ runyuv10 ARGBScale*
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=*ARGBScale* | sed 
's/\(.*(\)\([0-9]*\)\( ms)\
)/\2 - \1\2\3/g' | sort -rn | grep ms
8059 - [       OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (8059 ms)
5894 - [       OK ] libyuvTest.ARGBScaleClipTo1366x768_Bilinear (5894 ms)
5312 - [       OK ] libyuvTest.ARGBScaleClipTo1280x720_Bilinear (5312 ms)
3674 - [       OK ] libyuvTest.ARGBScaleClipTo1366x768_None (3674 ms)
3193 - [       OK ] libyuvTest.ARGBScaleClipTo1280x720_None (3193 ms)
2860 - [       OK ] libyuvTest.ARGBScaleClipTo853x480_Bilinear (2860 ms)
2853 - [       OK ] libyuvTest.ARGBScaleTo1366x768_Bilinear (2853 ms)
2559 - [       OK ] libyuvTest.ARGBScaleTo1280x720_Bilinear (2559 ms)
2099 - [       OK ] libyuvTest.ARGBScaleClipDownBy38_Bilinear (2099 ms)
1735 - [       OK ] libyuvTest.ARGBScaleTo1366x768_None (1735 ms)
1527 - [       OK ] libyuvTest.ARGBScaleTo1280x720_None (1527 ms)
1489 - [       OK ] libyuvTest.ARGBScaleTo853x480_Bilinear (1489 ms)
1459 - [       OK ] libyuvTest.ARGBScaleClipTo853x480_None (1459 ms)
726 - [       OK ] libyuvTest.ARGBScaleTo853x480_None (726 ms)
670 - [       OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (670 ms)
636 - [       OK ] libyuvTest.ARGBScaleClipFrom640x360_Bilinear (636 ms)
636 - [       OK ] libyuvTest.ARGBScaleClipDownBy1_Bilinear (636 ms)
621 - [       OK ] libyuvTest.ARGBScaleClipDownBy1_None (621 ms)
619 - [       OK ] libyuvTest.ARGBScaleClipFrom640x360_None (619 ms)
462 - [       OK ] libyuvTest.ARGBScaleClipDownBy34_None (462 ms)
259 - [       OK ] libyuvTest.ARGBScaleFrom640x360_None (259 ms)
259 - [       OK ] libyuvTest.ARGBScaleDownBy1_None (259 ms)
259 - [       OK ] libyuvTest.ARGBScaleDownBy1_Bilinear (259 ms)
258 - [       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (258 ms)
254 - [       OK ] libyuvTest.ARGBScaleDownBy34_None (254 ms)
253 - [       OK ] libyuvTest.ARGBScaleDownBy38_Bilinear (253 ms)
246 - [       OK ] libyuvTest.ARGBScaleClipDownBy2_Bilinear (246 ms)
192 - [       OK ] libyuvTest.ARGBScaleClipDownBy2_None (192 ms)
173 - [       OK ] libyuvTest.ARGBScaleDownBy2_Bilinear (173 ms)
133 - [       OK ] libyuvTest.ARGBScaleClipDownBy38_None (133 ms)
118 - [       OK ] libyuvTest.ARGBScaleDownBy2_None (118 ms)
89 - [       OK ] libyuvTest.ARGBScaleDownBy38_None (89 ms)
83 - [       OK ] libyuvTest.ARGBScaleClipDownBy4_Bilinear (83 ms)
74 - [       OK ] libyuvTest.ARGBScaleClipDownBy16_Bilinear (74 ms)
71 - [       OK ] libyuvTest.ARGBScaleDownBy4_Bilinear (71 ms)
58 - [       OK ] libyuvTest.ARGBScaleClipDownBy4_None (58 ms)
46 - [       OK ] libyuvTest.ARGBScaleClipDownBy5_None (46 ms)
46 - [       OK ] libyuvTest.ARGBScaleClipDownBy5_Bilinear (46 ms)
39 - [       OK ] libyuvTest.ARGBScaleDownBy4_None (39 ms)
33 - [       OK ] libyuvTest.ARGBScaleDownBy5_None (33 ms)
32 - [       OK ] libyuvTest.ARGBScaleDownBy5_Bilinear (32 ms)
32 - [       OK ] libyuvTest.ARGBScaleDownBy16_Bilinear (32 ms)
31 - [       OK ] libyuvTest.ARGBScaleClipDownBy8_Bilinear (31 ms)
30 - [       OK ] libyuvTest.ARGBScaleDownBy8_Bilinear (30 ms)
21 - [       OK ] libyuvTest.ARGBScaleClipDownBy8_None (21 ms)
15 - [       OK ] libyuvTest.ARGBScaleDownBy8_None (15 ms)
12 - [       OK ] libyuvTest.ARGBScaleClipDownBy16_None (12 ms)
10 - [       OK ] libyuvTest.ARGBScaleDownBy16_None (10 ms)
[----------] 48 tests from libyuvTest (50209 ms total)
[==========] 48 tests from 1 test case ran. (50209 ms total)

There are 2 issues
1. calling overhead

ARGBScaleClipTo1280x720_None (3193 ms)
ARGBScaleTo1280x720_None (1527 ms)

ARGBScaleClipTo1280x720_Bilinear (5312 ms)
ARGBScaleTo1280x720_Bilinear (2559 ms)

2. When scaling down with bilinear the full rows are blended.
ARGBScaleClipDownBy34_Bilinear (8059 ms)
ARGBScaleDownBy34_Bilinear (670 ms)

Original issue reported on code.google.com by fbarch...@google.com on 16 May 2013 at 8:00

GoogleCodeExporter commented 9 years ago
Running perf shows its spending too much time filtering the rows.

Events: 9K cycles
 79.39%  libyuv_unittest  libyuv_unittest    [.] ARGBInterpolateRow_SSSE3
  9.94%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBFilterCols_SSSE3
  5.44%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBCols_SSE2
  1.37%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBBilinearDown
  1.19%  libyuv_unittest  libyuv_unittest    [.] ScaleARGB
  1.03%  libyuv_unittest  libc-2.15.so       [.] getenv
  0.79%  libyuv_unittest  libc-2.15.so       [.] __strncmp_sse2
  0.19%  libyuv_unittest  libyuv_unittest    [.] ARGBScaleClip
  0.18%  libyuv_unittest  libc-2.15.so       [.] __random_r
  0.17%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.07%  libyuv_unittest  libc-2.15.so       [.] __random
  0.06%  libyuv_unittest  [kernel.kallsyms]  [k] 0xffffffff8103b51a
  0.06%  libyuv_unittest  libc-2.15.so       [.] __memset_sse2
  0.04%  libyuv_unittest  libc-2.15.so       [.] __strlen_sse2
  0.03%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBFilterCols_C
  0.02%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.02%  libyuv_unittest  ld-2.15.so         [.] _dl_relocate_object
  0.01%  libyuv_unittest  libyuv_unittest    [.] random@plt
  0.01%  libyuv_unittest  libyuv_unittest    [.] ARGBScale

Original intent was to clip the source region

On Windows, before:
d:\src\libyuv2\trunk>out\release\libyuv_unittest 
--gtest_filter=*ARGBScale*DownBy34_Bilinear   | sed "s/\(.*(\)\([0-9]*\)\( 
ms)\)/\2 - \1\2\3/
g"   | c:\cygwin\bin\sort -rn   | grep ms
11842 - [       OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (11842 ms)
726 - [       OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (726 ms)
[==========] 2 tests from 1 test case ran. (12568 ms total)
[----------] 2 tests from libyuvTest (12568 ms total)

After
d:\src\libyuv\trunk>out\release\libyuv_unittest 
--gtest_filter=*ARGBScale*DownBy34_Bilinear   | sed "s/\(.*(\)\([0-9]*\)\( 
ms)\)/\2 - \1\2\3/g
"   | c:\cygwin\bin\sort -rn   | grep ms
5293 - [  FAILED  ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (5293 ms)
794 - [       OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (794 ms)
[==========] 2 tests from 1 test case ran. (6087 ms total)

Bug causing failure needs to be resolved.  2x faster, but still slow.

Original comment by fbarch...@chromium.org on 17 May 2013 at 3:52

GoogleCodeExporter commented 9 years ago
r696 removes getenv

Before
11842 - [       OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (11842 ms)
726 - [       OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (726 ms)

After
8540 - [       OK ] libyuvTest.ARGBScaleClipDownBy34_Bilinear (8540 ms)
716 - [       OK ] libyuvTest.ARGBScaleDownBy34_Bilinear (716 ms)

Events: 9K cycles
 81.53%  libyuv_unittest  libyuv_unittest    [.] ARGBInterpolateRow_SSSE3
  9.85%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBFilterCols_SSSE3
  5.39%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBCols_SSE2
  1.16%  libyuv_unittest  libyuv_unittest    [.] ScaleARGB
  1.09%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBBilinearDown
  0.29%  libyuv_unittest  libyuv_unittest    [.] ARGBScaleClip
  0.15%  libyuv_unittest  [kernel.kallsyms]  [k] 0xffffffff8103b51a
  0.14%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.14%  libyuv_unittest  libc-2.15.so       [.] __random_r
  0.12%  libyuv_unittest  libc-2.15.so       [.] __random
  0.05%  libyuv_unittest  libc-2.15.so       [.] __memset_sse2
  0.04%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.02%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBFilterCols_C
  0.01%  libyuv_unittest  libyuv_unittest    [.] random@plt
  0.01%  libyuv_unittest  libc-2.15.so       [.] _int_malloc

Original comment by fbarch...@chromium.org on 17 May 2013 at 9:20

GoogleCodeExporter commented 9 years ago
Events: 2K cycles
 35.28%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBFilterCols_SSSE3
 29.69%  libyuv_unittest  libyuv_unittest    [.] ARGBInterpolateRow_SSSE3
 21.92%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBCols_SSE2
  4.80%  libyuv_unittest  libyuv_unittest    [.] ScaleARGB
  4.02%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBBilinearDown
  1.31%  libyuv_unittest  libyuv_unittest    [.] ARGBScaleClip
  0.88%  libyuv_unittest  libc-2.15.so       [.] __random_r
  0.74%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.62%  libyuv_unittest  [kernel.kallsyms]  [k] 0xffffffff8103b51a
  0.22%  libyuv_unittest  libc-2.15.so       [.] __random
  0.13%  libyuv_unittest  libyuv_unittest    [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.09%  libyuv_unittest  libc-2.15.so       [.] __memset_sse2
  0.09%  libyuv_unittest  libyuv_unittest    [.] random@plt
  0.09%  libyuv_unittest  libyuv_unittest    [.] ARGBInterpolateRow_C
  0.04%  libyuv_unittest  ld-2.15.so         [.] do_lookup_x
  0.04%  libyuv_unittest  libyuv_unittest    [.] testing::internal::UnitTestOptions::PatternMatchesString(char const*, char const*)
  0.04%  libyuv_unittest  libyuv_unittest    [.] ScaleARGBCols_C

Original comment by fbarch...@chromium.org on 17 May 2013 at 11:56

GoogleCodeExporter commented 9 years ago
Overall 8.8x faster clipping
Was ARGBScaleClipDownBy34_Bilinear (11842 ms)
Now ARGBScaleClipDownBy34_Bilinear (1341 ms)

Original comment by fbarch...@chromium.org on 19 May 2013 at 7:14