rolandyue / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

linux top bottlenecks #492

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Investigate top bottlenecks

LIBYUV_DISABLE_AVX2=1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=1000 
perf record out/Release/libyuv_unittest --gtest_filter=*
perf report

 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_ScaleTestRoundToByte_Test::T◆
 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBo▒
  4.94%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C                                  ▒
  4.07%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2                   ▒
  3.63%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_SSSE3                           ▒
  3.57%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBMatrixRow_SSSE3                      ▒
  3.06%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3                      ▒
  3.02%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3                          ▒
  2.63%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2                      ▒
  2.58%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleARGB(unsigned char const*, int, in▒
  2.57%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C                        ▒
  2.45%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2                             ▒
  2.44%  libyuv_unittest  libc-2.19.so         [.] __random_r                                     ▒
  2.23%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS                                   ▒
  1.64%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRMatrixRow_SSSE3                      ▒
  1.46%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2                      ▒
  1.29%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86                                   ▒
  1.26%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, uns▒
  1.24%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_SSSE3                           ▒
  1.21%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2                 ▒
  1.14%  libyuv_unittest  libc-2.19.so         [.] __random                                       ▒
  1.08%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C                                    ▒
  0.99%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_SSSE3                               ▒
  0.75%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2                   ▒
  0.75%  libyuv_unittest  libc-2.19.so         [.] _int_malloc       

Original issue reported on code.google.com by fbarch...@google.com on 16 Sep 2015 at 11:36

GoogleCodeExporter commented 8 years ago
r1483 removes redundent scale rounding test.

Rounding test is still top bottleneck though on linux.

 16.52%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()

Original comment by fbarch...@google.com on 17 Sep 2015 at 5:28

GoogleCodeExporter commented 8 years ago
The following is a complete list of C functions (there should be none)

LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*
perf report >out.txt
grep _C out.txt

     5.88%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     3.08%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     1.38%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, unsigned short const*, unsigned char*)
     1.28%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     0.52%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
     0.25%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_C
     0.14%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols2_C(int, int, int, int, unsigned short const*, unsigned char*)
     0.07%  libyuv_unittest  libyuv_unittest      [.] ScaleColsUp2_C
     0.03%  libyuv_unittest  libyuv_unittest      [.] MirrorUVRow_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWxH_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_0_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_1_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] TransposeUVWx8_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown2Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_2_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_CropNV12_Test::TestBody()
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVJ422Row_C

Original comment by fbarch...@google.com on 17 Sep 2015 at 6:35

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*

    18.31%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.47%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.05%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.81%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.43%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.08%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.86%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.72%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.60%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.48%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.47%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.30%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.08%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 23 Sep 2015 at 8:27

GoogleCodeExporter commented 8 years ago
NV12ToARGB optimized
    18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.50%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.16%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.83%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.92%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.75%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.61%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.48%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.26%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     0.93%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     0.92%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     0.83%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
     0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
     0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
     0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
     0.61%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
     0.57%  libyuv_unittest  libyuv_unittest      [.] next_marker
     0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 25 Sep 2015 at 7:31

GoogleCodeExporter commented 8 years ago
NV12 AVX2
 18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
  6.53%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
  5.08%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  4.84%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  3.12%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  2.90%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  2.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
  2.71%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.38%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  1.76%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  1.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  1.49%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.41%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
  0.99%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  0.92%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  0.87%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  0.84%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.62%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.55%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3
  0.54%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
  0.50%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
  0.48%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
  0.47%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
  0.46%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
  0.45%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
  0.43%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
  0.42%  libyuv_unittest  libyuv_unittest      [.] I422ToBGRARow_AVX2
  0.41%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
  0.40%  libyuv_unittest  libc-2.19.so         [.] _int_free
  0.40%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_AVX2

Original comment by fbarch...@google.com on 25 Sep 2015 at 11:57

GoogleCodeExporter commented 8 years ago
TestRoundToByte is too slow.  Improve rounding and/or test

LIBYUV_REPEAT=100 out/Release/libyuv_unittest 
--gtest_filter=libyuvTest.TestRoundToByte
Note: Google Test filter = libyuvTest.TestRoundToByte
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.TestRoundToByte
[       OK ] libyuvTest.TestRoundToByte (10731 ms)
[----------] 1 test from libyuvTest (10731 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (10731 ms total)
[  PASSED  ] 1 test.

Original comment by fbarch...@google.com on 2 Oct 2015 at 6:01

GoogleCodeExporter commented 8 years ago
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=**TestRoundToByte* 
[       OK ] libyuvTest.TestRoundToByte (419442 ms)
[----------] 1 test from libyuvTest (419442 ms total)

Performance of 4 rounding methods on Linux GCC:

#define ROUND(f) static_cast<int>(f + 0.5)
TestRoundToByte (10731 ms)

#define ROUND(f) lrintf(f)
TestRoundToByte (7911 ms)

#define ROUND(f) static_cast<int>(round(f))
TestRoundToByte (12700 ms)

#define ROUND(f) _mm_cvt_ss2si(_mm_load_ss(&f))
TestRoundToByte (10428 ms)

Original comment by fbarch...@google.com on 2 Oct 2015 at 6:19

GoogleCodeExporter commented 8 years ago
     7.94%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     6.08%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.04%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.46%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.87%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.69%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.63%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     3.53%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.31%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.91%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.95%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.83%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.80%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.71%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.57%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.52%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.13%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.12%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.12%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.05%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     1.03%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.02%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3

Original comment by fbarch...@chromium.org on 2 Oct 2015 at 11:03

GoogleCodeExporter commented 8 years ago
r1502 performance
     6.66%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.48%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.77%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.46%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     4.14%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.96%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.76%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.71%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     3.57%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     3.12%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.29%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     2.13%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.95%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.92%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.80%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
     1.75%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.63%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.47%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.22%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.21%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.21%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.11%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.08%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.90%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2

Original comment by fbarch...@chromium.org on 7 Oct 2015 at 5:47

GoogleCodeExporter commented 8 years ago
fbarchard@fbarchard-linux:~/src/libyuv/libyuv$ runyuv10 | more
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=* | grep ms | sed 
's/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g' | sort -rn
| sed 's/.*- \(.*\)/\1/g'
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_Linear (11452 ms)
[  FAILED  ] libyuvTest.ScaleDownBy8_Box (10933 ms)
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_Bilinear (9219 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy4_Box (6844 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Box (5228 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Bilinear (5218 ms)
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_None (4465 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Box (3887 ms)
[       OK ] libyuvTest.ARGBScaleClipTo569x480_Linear (3768 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy8_Box (3407 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy8_Bilinear (3346 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom569x480_Bilinear (3327 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom352x288_Linear (3257 ms)
[       OK ] libyuvTest.ARGBScaleClipTo569x480_Bilinear (3215 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom320x240_Linear (3149 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Bilinear (3067 ms)
[       OK ] libyuvTest.TestFixedDiv (2970 ms)
[       OK ] libyuvTest.TestFixedDiv1_Opt (2970 ms)
[       OK ] libyuvTest.TestFixedDiv_Opt (2966 ms)
[       OK ] libyuvTest.ARGBScaleDownBy4_Box (2903 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Bilinear (2869 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Box (2852 ms)
[       OK ] libyuvTest.ARGBScaleDownBy8_Bilinear (2837 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Linear (2825 ms)
[  FAILED  ] libyuvTest.ScaleDownBy3_Box (2764 ms)
[       OK ] libyuvTest.ARGBScaleDownBy8_Box (2744 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom352x288_Bilinear (2629 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom320x240_Bilinear (2512 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Any (2412 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Unaligned (2390 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Opt (2379 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Invert (2379 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Linear (2148 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom569x480_Linear (2141 ms)
[       OK ] libyuvTest.ARGBScaleTo1280x720_Bilinear (2138 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Linear (2123 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Invert (2040 ms)
[       OK ] libyuvTest.ARGBScaleTo1280x720_Linear (2038 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Opt (2019 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Unaligned (2017 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Any (2007 ms)

Original comment by fbarch...@chromium.org on 7 Oct 2015 at 5:24

GoogleCodeExporter commented 8 years ago
     6.80%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.59%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.93%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.57%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     4.30%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     4.08%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.99%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.68%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     3.22%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.37%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     2.15%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     2.02%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     2.00%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.89%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.87%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
     1.83%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.68%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.25%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.23%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.15%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.12%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.91%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
     0.90%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
     0.88%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
     0.84%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
     0.81%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
     0.74%  libyuv_unittest  libyuv_unittest      [.] next_marker
     0.74%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_SSSE3
     0.72%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
     0.67%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
     0.65%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
     0.64%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
     0.62%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
     0.61%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
     0.58%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
     0.57%  libyuv_unittest  libyuv_unittest      [.] I422ToBGRARow_AVX2
     0.56%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
     0.53%  libyuv_unittest  libc-2.19.so         [.] _int_free
     0.49%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int, int)
     0.49%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_SSSE3
     0.49%  libyuv_unittest  libyuv_unittest      [.] ARGB1555ToARGBRow_SSE2
     0.45%  libyuv_unittest  libyuv_unittest      [.] ARGBUnattenuateRow_AVX2
     0.43%  libyuv_unittest  libyuv_unittest      [.] ARGBBlendRow_SSSE3
     0.42%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB4444Row_SSE2
     0.39%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3
     0.39%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_AVX2

Original comment by fbarch...@chromium.org on 8 Oct 2015 at 3:16

GoogleCodeExporter commented 8 years ago
r1513
  6.88%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  6.73%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  4.98%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  4.69%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  4.31%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  4.12%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  3.90%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  3.68%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.92%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  2.40%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  2.22%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  2.04%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.97%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.86%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
  1.83%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.70%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
  1.55%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.44%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  1.27%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  1.27%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  1.16%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  1.14%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  0.93%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.92%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.89%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.84%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.80%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.75%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.74%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_SSSE3
  0.72%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C

Original comment by fbarch...@chromium.org on 16 Oct 2015 at 6:09

GoogleCodeExporter commented 8 years ago
On Arm, some performance numbers
I   31.227s run_tests_on_device(HT4A2JT03762)  [==========] Running 20 tests 
from 1 test case.
I   31.227s run_tests_on_device(HT4A2JT03762)  [----------] Global test 
environment set-up.
I   31.228s run_tests_on_device(HT4A2JT03762)  [----------] 20 tests from 
LibYUVConvertTest
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI420_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI420_Opt (353 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI422_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI422_Opt (407 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI444_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI444_Opt (2681 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI411_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI411_Opt (838 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI420Mirror_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI420Mirror_Opt (423 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToNV12_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToNV12_Opt (296 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToNV21_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToNV21_Opt (275 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB_Opt (1480 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToBGRA_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToBGRA_Opt (1490 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToABGR_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToABGR_Opt (1465 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGBA_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGBA_Opt (1509 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRAW_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRAW_Opt (1576 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB24_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB24_Opt (1651 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB565_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB565_Opt (1563 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB1555_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB1555_Opt (1566 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB4444_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB4444_Opt (1533 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToYUY2_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToYUY2_Opt (348 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToUYVY_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToUYVY_Opt (350 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI400_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI400_Opt (149 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB565Dither_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB565Dither_Opt (1962 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [----------] 20 tests from 
LibYUVConvertTest (21920 ms total)
I   31.230s run_tests_on_device(HT4A2JT03762)  
I   31.230s run_tests_on_device(HT4A2JT03762)  [----------] Global test 
environment tear-down
I   31.230s run_tests_on_device(HT4A2JT03762)  [==========] 20 tests from 1 
test case ran. (21924 ms total)
I   31.230s run_tests_on_device(HT4A2JT03762)  [  PASSED  ] 20 tests.

Original comment by fbarch...@google.com on 18 Oct 2015 at 7:30

GoogleCodeExporter commented 8 years ago
LibYUVScaleTest.ScaleDownBy8_Box (200681 ms)
LibYUVBaseTest.TestFixedDiv1_Opt (194044 ms)
LibYUVBaseTest.TestFixedDiv_Opt (191014 ms)
LibYUVBaseTest.TestFixedDiv (104787 ms)
LibYUVConvertTest.I420AlphaToARGB_Premult (77882 ms)
LibYUVConvertTest.I420AlphaToABGR_Premult (77382 ms)
LibYUVConvertTest.I444ToABGR_Unaligned (77309 ms)
LibYUVConvertTest.I444ToABGR_Invert (77230 ms)
LibYUVConvertTest.I444ToABGR_Any (77130 ms)
LibYUVConvertTest.I444ToABGR_Opt (77053 ms)
LibYUVConvertTest.I420AlphaToARGB_Invert (76718 ms)
LibYUVConvertTest.I420AlphaToARGB_Unaligned (76620 ms)
LibYUVConvertTest.I420AlphaToARGB_Opt (76488 ms)
LibYUVConvertTest.I420AlphaToARGB_Any (76200 ms)
LibYUVConvertTest.I420AlphaToABGR_Opt (74689 ms)
LibYUVConvertTest.I420AlphaToABGR_Any (74684 ms)
LibYUVConvertTest.I420AlphaToABGR_Invert (74532 ms)
LibYUVConvertTest.I420AlphaToABGR_Unaligned (74470 ms)
LibYUVPlanarTest.TestARGBPolynomial (67153 ms)
LibYUVScaleTest.ARGBScaleDownClipBy4_Box (48645 ms)
LibYUVScaleTest.ScaleDownBy3_Box (45322 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Bilinear (43125 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Box (42043 ms)
LibYUVRotateTest.ARGBRotate270 (39778 ms)
LibYUVScaleTest.ARGBScaleDownClipBy8_Bilinear (39699 ms)
LibYUVRotateTest.ARGBRotate90 (39674 ms)
LibYUVScaleTest.ARGBScaleDownClipBy8_Box (38420 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Box (36128 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Bilinear (35155 ms)
LibYUVScaleTest.ARGBScaleDownBy4_Box (34227 ms)
LibYUVPlanarTest.ARGBBlur_Invert (30982 ms)
LibYUVPlanarTest.ARGBBlur_Any (30886 ms)
LibYUVPlanarTest.ARGBBlur_Unaligned (30757 ms)
LibYUVPlanarTest.ARGBBlur_Opt (30696 ms)
LibYUVScaleTest.ARGBScaleClipFrom569x480_Bilinear (29419 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Linear (29374 ms)
LibYUVScaleTest.ARGBScaleClipFrom640x360_Bilinear (28262 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Linear (27859 ms)
LibYUVScaleTest.ARGBScaleDownClipBy8_Linear (27853 ms)
LibYUVPlanarTest.ARGBBlurSmall_Invert (27040 ms)
LibYUVPlanarTest.ARGBBlurSmall_Any (26982 ms)
LibYUVPlanarTest.ARGBBlurSmall_Opt (26738 ms)
LibYUVPlanarTest.ARGBBlurSmall_Unaligned (26735 ms)
LibYUVScaleTest.ARGBScaleClipFrom320x240_Bilinear (26602 ms)
LibYUVScaleTest.ARGBScaleClipFrom352x288_Bilinear (26565 ms)
LibYUVScaleTest.ARGBScaleClipFrom569x480_Linear (26202 ms)
LibYUVScaleTest.ARGBScaleDownClipBy4_Bilinear (25372 ms)
LibYUVScaleTest.ARGBScaleClipFrom640x360_Linear (24780 ms)
LibYUVScaleTest.ARGBScaleClipFrom352x288_Linear (24535 ms)
LibYUVScaleTest.ARGBScaleDownClipBy8_None (24087 ms)
LibYUVScaleTest.ARGBScaleClipFrom320x240_Linear (23947 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3_Box (23095 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3_Linear (23081 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3_Bilinear (22962 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3_None (22827 ms)
LibYUVScaleTest.ARGBScaleDownClipBy4_Linear (22667 ms)
LibYUVScaleTest.ARGBScaleDownClipBy2_Bilinear (22186 ms)
LibYUVScaleTest.ARGBScaleDownClipBy2_Box (22131 ms)
LibYUVScaleTest.ARGBScaleDownClipBy2_Linear (20082 ms)
LibYUVScaleTest.ARGBScaleDownClipBy4_None (19863 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by8_None (19275 ms)
LibYUVScaleTest.ARGBScaleDownClipBy3by4_None (17790 ms)
LibYUVScaleTest.ARGBScaleClipFrom569x480_None (17676 ms)
LibYUVScaleTest.ARGBScaleClipFrom1280x720_Bilinear (17553 ms)
LibYUVScaleTest.ARGBScaleClipTo1280x720_Bilinear (17533 ms)
LibYUVScaleTest.ARGBScaleClipTo1280x720_None (17471 ms)
LibYUVScaleTest.ARGBScaleClipFrom1280x720_Linear (17440 ms)
LibYUVScaleTest.ARGBScaleClipTo1280x720_Linear (17425 ms)
LibYUVScaleTest.ARGBScaleClipFrom1280x720_None (17398 ms)
LibYUVScaleTest.ARGBScaleClipFrom640x360_None (17378 ms)
LibYUVScaleTest.ARGBScaleDownClipBy2_None (16051 ms)
LibYUVScaleTest.ARGBScaleClipFrom352x288_None (15855 ms)
LibYUVScaleTest.ARGBScaleDownBy8_Box (15648 ms)
LibYUVScaleTest.ARGBScaleClipFrom320x240_None (15468 ms)
LibYUVScaleTest.ARGBScaleDownBy8_Bilinear (15367 ms)
LibYUVScaleTest.ARGBScaleClipFrom1x1_None (14451 ms)
LibYUVScaleTest.ARGBScaleClipFrom1x1_Bilinear (14392 ms)
LibYUVScaleTest.ARGBScaleClipFrom1x1_Linear (14313 ms)
LibYUVScaleTest.ARGBScaleClipTo569x480_Bilinear (11779 ms)
LibYUVScaleTest.ARGBScaleDownBy8_Linear (11699 ms)
LibYUVScaleTest.ScaleDownBy8_Bilinear (11674 ms)
LibYUVScaleTest.ARGBScaleDownBy3by8_Box (11625 ms)
LibYUVScaleTest.ARGBScaleDownBy3by8_Bilinear (11595 ms)
LibYUVPlanarTest.TestARGBLumaColorTable (11480 ms)
LibYUVScaleTest.ARGBScaleDownBy4_Bilinear (10768 ms)
LibYUVRotateTest.ARGBRotate270_Odd (10471 ms)
LibYUVRotateTest.ARGBRotate90_Odd (10401 ms)
LibYUVPlanarTest.TestARGBColorTable (9869 ms)
LibYUVConvertTest.ARGBToUYVY_Opt (9293 ms)
LibYUVScaleTest.ScaleDownBy4_Bilinear (9191 ms)
LibYUVConvertTest.ARGBToUYVY_Any (8985 ms)
LibYUVScaleTest.ARGBScaleClipTo569x480_Linear (8688 ms)
LibYUVConvertTest.ARGBToUYVY_Unaligned (8364 ms)
LibYUVConvertTest.ARGBToYUY2_Opt (8320 ms)
LibYUVPlanarTest.ARGBUnattenuate_Invert (8220 ms)
LibYUVScaleTest.ARGBScaleDownBy4_Linear (8118 ms)
LibYUVPlanarTest.ARGBUnattenuate_Opt (8066 ms)
LibYUVScaleTest.ARGBScaleDownBy8_None (7959 ms)
LibYUVScaleTest.ARGBScaleDownBy3by4_Bilinear (7814 ms)
LibYUVScaleTest.ARGBScaleDownBy3by4_Box (7807 ms)
LibYUVPlanarTest.ARGBUnattenuate_Any (7779 ms)
LibYUVScaleTest.ScaleDownBy8_Linear (7776 ms)
LibYUVPlanarTest.ARGBUnattenuate_Unaligned (7774 ms)
LibYUVConvertTest.ARGBToYUY2_Unaligned (7707 ms)
LibYUVConvertTest.ARGBToYUY2_Any (7673 ms)
LibYUVScaleTest.ARGBScaleDownBy2_Bilinear (7508 ms)
LibYUVScaleTest.ScaleDownBy4_Box (7434 ms)
LibYUVScaleTest.ARGBScaleDownBy2_Box (7395 ms)
LibYUVConvertTest.I420ToI444_Any (7304 ms)
LibYUVConvertTest.I420ToI444_Opt (7142 ms)
LibYUVScaleTest.ScaleDownBy4_Linear (7117 ms)
LibYUVScaleTest.ScaleDownBy3by8_Linear (7054 ms)
LibYUVScaleTest.ScaleDownBy3by8_Bilinear (7011 ms)
LibYUVScaleTest.ScaleDownBy3by8_Box (7003 ms)
LibYUVConvertTest.I420ToI444_Invert (6846 ms)
LibYUVPlanarTest.TestRGBColorTable (6813 ms)
LibYUVConvertTest.I420ToI444_Unaligned (6799 ms)
LibYUVScaleTest.ScaleTo352x288_Box (6198 ms)
LibYUVPlanarTest.ARGBSobelXY_Any (6115 ms)
LibYUVPlanarTest.ARGBSobel_Any (6105 ms)
LibYUVPlanarTest.ARGBSobel_Invert (5977 ms)
LibYUVColorTest.TestFullYUV (5914 ms)
LibYUVPlanarTest.ARGBSobelXY_Invert (5894 ms)
LibYUVPlanarTest.ARGBSobel_Opt (5847 ms)
LibYUVPlanarTest.ARGBSobelXY_Opt (5813 ms)
LibYUVPlanarTest.ARGBSobel_Unaligned (5799 ms)
LibYUVPlanarTest.ARGBSobelXY_Unaligned (5720 ms)
LibYUVScaleTest.ARGBScaleFrom569x480_Bilinear (5595 ms)
LibYUVColorTest.TestFullYUVJ (5560 ms)
LibYUVScaleTest.ARGBScaleClipTo352x288_Bilinear (5557 ms)
LibYUVPlanarTest.ARGBSobelToPlane_Any (5465 ms)
LibYUVScaleTest.ScaleFrom569x480_Bilinear (5335 ms)
LibYUVScaleTest.ScaleFrom569x480_Box (5328 ms)
LibYUVPlanarTest.ARGBSobelToPlane_Invert (5157 ms)
LibYUVPlanarTest.ARGBSobelToPlane_Opt (5074 ms)
LibYUVPlanarTest.ARGBAdd_Unaligned (5072 ms)
LibYUVScaleTest.ScaleDownBy8_None (5009 ms)
LibYUVScaleTest.ARGBScaleDownBy2_Linear (5008 ms)
LibYUVPlanarTest.ARGBSobelToPlane_Unaligned (4996 ms)
LibYUVPlanarTest.ARGBSubtract_Unaligned (4911 ms)
LibYUVScaleTest.ARGBScaleClipTo569x480_None (4887 ms)
LibYUVConvertTest.I420ToRGB565Dither_Any (4884 ms)
LibYUVScaleTest.ARGBScaleFrom640x360_Bilinear (4882 ms)
LibYUVScaleTest.ARGBScaleDownBy3by8_Linear (4882 ms)
LibYUVScaleTest.ScaleFrom569x480_Linear (4854 ms)
LibYUVConvertTest.ARGBToI444_Unaligned (4674 ms)
LibYUVScaleTest.ARGBScaleClipTo640x360_Bilinear (4604 ms)
LibYUVPlanarTest.ARGBSubtract_Invert (4598 ms)
LibYUVPlanarTest.ARGBAdd_Invert (4593 ms)
LibYUVPlanarTest.ARGBAdd_Opt (4588 ms)
LibYUVPlanarTest.ARGBSubtract_Opt (4570 ms)
LibYUVScaleTest.ARGBScaleDownBy4_None (4535 ms)
LibYUVScaleTest.ARGBScaleDownBy3by4_Linear (4532 ms)
LibYUVScaleTest.ScaleTo320x240_Box (4504 ms)
LibYUVConvertTest.I420ToRGB565Dither_Unaligned (4447 ms)
LibYUVPlanarTest.ARGBMultiply_Opt (4427 ms)
LibYUVScaleTest.ScaleDownBy3_Bilinear (4426 ms)
LibYUVConvertTest.RGB565ToI420_Any (4387 ms)
LibYUVPlanarTest.ARGBMultiply_Invert (4349 ms)
LibYUVConvertTest.ARGB1555ToI420_Any (4340 ms)
LibYUVScaleTest.ScaleDownBy3_None (4336 ms)
LibYUVConvertTest.I420ToRGB565Dither_Invert (4330 ms)
LibYUVScaleTest.ScaleDownBy3_Linear (4322 ms)
LibYUVConvertTest.I420ToRGB565Dither_Opt (4307 ms)
LibYUVScaleTest.ARGBScaleFrom352x288_Bilinear (4271 ms)
LibYUVScaleTest.ARGBScaleClipTo640x360_Linear (4241 ms)
LibYUVScaleTest.ScaleFrom640x360_Box (4232 ms)
LibYUVScaleTest.ScaleFrom640x360_Bilinear (4231 ms)
LibYUVConvertTest.I444ToARGB_Unaligned (4229 ms)
LibYUVConvertTest.I420ToARGB1555_Any (4214 ms)
LibYUVConvertTest.ARGBToI444_Opt (4122 ms)
LibYUVConvertTest.I411ToARGB_Unaligned (4090 ms)
LibYUVScaleTest.ARGBScaleFrom569x480_Linear (4083 ms)
LibYUVConvertTest.ARGB4444ToI420_Any (4028 ms)
LibYUVScaleTest.ARGBScaleFrom320x240_Bilinear (4026 ms)
LibYUVConvertTest.RGB565ToI420_Unaligned (4026 ms)
LibYUVConvertTest.I420ToRAW_Any (4023 ms)
LibYUVConvertTest.ARGB1555ToI420_Unaligned (4018 ms)
LibYUVPlanarTest.ARGBMultiply_Unaligned (3990 ms)
LibYUVConvertTest.UYVYToARGB_Invert (3979 ms)
LibYUVConvertTest.I444ToARGB_Opt (3974 ms)
LibYUVConvertTest.J444ToARGB_Opt (3967 ms)
LibYUVConvertTest.I420ToARGB_Unaligned (3966 ms)
LibYUVConvertTest.J420ToARGB_Any (3965 ms)
LibYUVConvertTest.J444ToARGB_Any (3959 ms)
LibYUVConvertTest.ARGBToI444_Any (3953 ms)
LibYUVConvertTest.UYVYToARGB_Unaligned (3940 ms)
LibYUVConvertTest.I420ToBGRA_Any (3925 ms)
LibYUVConvertTest.YUY2ToARGB_Invert (3923 ms)
LibYUVConvertTest.NV12ToRGB565_Any (3922 ms)
LibYUVConvertTest.I420ToRGBA_Any (3918 ms)
LibYUVConvertTest.H420ToARGB_Unaligned (3914 ms)
LibYUVRotateTest.ARGBRotate180_Odd (3889 ms)
LibYUVConvertTest.ARGB1555ToI420_Invert (3859 ms)
LibYUVConvertTest.J420ToABGR_Any (3850 ms)
LibYUVConvertTest.J444ToARGB_Unaligned (3849 ms)
LibYUVConvertTest.RGB565ToI420_Invert (3844 ms)
LibYUVConvertTest.H420ToABGR_Unaligned (3836 ms)
LibYUVConvertTest.ARGB1555ToI420_Opt (3833 ms)
LibYUVConvertTest.RGB565ToI420_Opt (3828 ms)
LibYUVConvertTest.I420ToRGB24_Any (3820 ms)
LibYUVConvertTest.NV12ToARGB_Any (3817 ms)
LibYUVPlanarTest.TestRGBColorMatrix (3815 ms)
LibYUVConvertTest.NV21ToARGB_Any (3799 ms)

Original comment by fbarch...@chromium.org on 24 Oct 2015 at 12:19

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest
perf report
  8.94%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  6.29%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  6.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  4.51%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  4.28%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  3.94%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  3.71%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  3.54%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  3.37%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.17%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  2.09%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_SSSE3
  2.03%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  1.98%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
  1.85%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.79%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.74%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.72%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
  1.41%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.36%  libyuv_unittest  libyuv_unittest      [.] I422AlphaToARGBRow_AVX2
  1.14%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  1.14%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  1.09%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  1.04%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  1.04%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  0.91%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB24Row_SSSE3
  0.84%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.84%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.80%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.78%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.77%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.74%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.71%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3
  0.68%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.66%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
  0.62%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
  0.59%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
  0.58%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
  0.58%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
  0.55%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
  0.54%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
  0.48%  libyuv_unittest  libc-2.19.so         [.] _int_free
  0.45%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_SSSE3
  0.45%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int, int, int)

Original comment by fbarch...@chromium.org on 10 Nov 2015 at 7:42

GoogleCodeExporter commented 8 years ago
I444ToARGBRow_SSSE3 needs AVX2 port.

SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)

AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)

Original comment by fbarch...@chromium.org on 14 Nov 2015 at 2:29

GoogleCodeExporter commented 8 years ago
The following revision refers to this bug:
  https://chromium.googlesource.com/libyuv/libyuv.git/+/1019e4537fc1bfc6ee505cd1c628b645c7e966b7

commit 1019e4537fc1bfc6ee505cd1c628b645c7e966b7
Author: Frank Barchard <fbarchard@google.com>
Date: Sat Nov 14 02:31:22 2015

port I444ToARGB avx2 code from Visual C to GCC.

SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)

AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1445893002 .

[modify] 
http://crrev.com/1019e4537fc1bfc6ee505cd1c628b645c7e966b7/include/libyuv/row.h
[modify] 
http://crrev.com/1019e4537fc1bfc6ee505cd1c628b645c7e966b7/source/row_gcc.cc

Original comment by bugdroid1@chromium.org on 14 Nov 2015 at 2:32

GoogleCodeExporter commented 8 years ago
util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose 
--release --gtest_filter=* -a "--libyuv_width=1280 --libyuv_height=720 
--libyuv_repeat=999 --libyuv
_flags=-1" | grep ms | sed 's/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g' | sort 
-rn | sed 's/.*- \(.*\)/\1/g'
I 3385.631s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ScaleDownBy8_Box (212336 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.TestARGBPolynomial (62884 ms)
I 3385.632s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ScaleDownBy3_Box (45134 ms)
I 3385.620s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Box (41680 ms)
I 3385.616s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy4_Box (40355 ms)
I 3385.620s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Bilinear (39277 ms)
I 1687.272s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVRotateTest.ARGBRotate270 (37779 ms)
I 1687.271s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVRotateTest.ARGBRotate90 (37493 ms)
I 3385.617s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy8_Bilinear (35383 ms)
I 3385.618s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy8_Box (35314 ms)
I 3385.619s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Box (32276 ms)
I 3385.619s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Bilinear (32001 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlur_Invert (31007 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlur_Opt (30818 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlur_Unaligned (30766 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlur_Any (30736 ms)
I 3385.616s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownBy4_Box (29546 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlurSmall_Invert (27381 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlurSmall_Unaligned (27267 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlurSmall_Any (27204 ms)
I 1687.270s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVPlanarTest.ARGBBlurSmall_Opt (27136 ms)
I 3385.619s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by8_Linear (25732 ms)
I 3385.627s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleClipFrom569x480_Bilinear (25521 ms)
I 3385.620s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3_Linear (25312 ms)
I 3385.618s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3by4_Linear (24994 ms)
I 3385.621s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3_Bilinear (24767 ms)
I 3385.621s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3_Box (24655 ms)
I 3385.620s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleDownClipBy3_None (24416 ms)
I 3385.628s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleClipFrom640x360_Bilinear (24332 ms)
I 3385.625s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVScaleTest.ARGBScaleClipFrom352x288_Bilinear (22861 ms)

Original comment by fbarch...@chromium.org on 14 Nov 2015 at 2:39

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*I444ToARGB*

AVX2 I444ToARGB_Opt (315 ms)
SSSE3 I444ToARGB_Opt (408 ms)
C I444ToARGB_Opt (4329 ms)

Original comment by fbarch...@chromium.org on 14 Nov 2015 at 2:56

GoogleCodeExporter commented 8 years ago
The following revision refers to this bug:
  https://chromium.googlesource.com/libyuv/libyuv.git/+/0815568a502c509970cd1177ed6f908305adcaa0

commit 0815568a502c509970cd1177ed6f908305adcaa0
Author: Frank Barchard <fbarchard@google.com>
Date: Tue Nov 17 08:04:03 2015

test for unaligned vs aligned for CopyRow_SSE2

improves performance on older CPUs where movdqa is faster.
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1455463002 .

[modify] 
http://crrev.com/0815568a502c509970cd1177ed6f908305adcaa0/README.chromium
[modify] 
http://crrev.com/0815568a502c509970cd1177ed6f908305adcaa0/include/libyuv/version
.h
[modify] 
http://crrev.com/0815568a502c509970cd1177ed6f908305adcaa0/source/row_gcc.cc
[modify] 
http://crrev.com/0815568a502c509970cd1177ed6f908305adcaa0/source/row_win.cc

Original comment by bugdroid1@chromium.org on 17 Nov 2015 at 8:04

GoogleCodeExporter commented 8 years ago
The following revision refers to this bug:
  https://chromium.googlesource.com/libyuv/libyuv.git/+/36615d62a0b4531a8bcd583c48e28547dbbbd554

commit 36615d62a0b4531a8bcd583c48e28547dbbbd554
Author: Frank Barchard <fbarchard@google.com>
Date: Tue Dec 22 20:29:54 2015

fix for InterpolateRow_AVX2
port scaledownby4_avx2 to gcc

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1546763002 .

[modify] 
http://crrev.com/36615d62a0b4531a8bcd583c48e28547dbbbd554/README.chromium
[modify] 
http://crrev.com/36615d62a0b4531a8bcd583c48e28547dbbbd554/include/libyuv/scale_r
ow.h
[modify] 
http://crrev.com/36615d62a0b4531a8bcd583c48e28547dbbbd554/include/libyuv/version
.h
[modify] 
http://crrev.com/36615d62a0b4531a8bcd583c48e28547dbbbd554/source/scale_gcc.cc

Original comment by bugdroid1@chromium.org on 22 Dec 2015 at 8:30

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest
perf report
 9.13%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2                  
 6.32%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2                 
 5.82%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2        
 4.93%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3           
 4.48%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3               
 3.84%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2           
 3.78%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2                  
 3.62%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleARGB(unsigned char cons
 3.38%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS                        
 2.19%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2           
 2.02%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86                        
 1.97%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2                  
 1.83%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2                 
 1.81%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2      
 1.74%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2                    
 1.61%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, in
 1.59%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C                         
 1.52%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_AVX2                  
 1.34%  libyuv_unittest  libyuv_unittest      [.] I422AlphaToARGBRow_AVX2             
 1.20%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int,
 1.12%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2        
 1.05%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2                     
 1.03%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3                 
 0.93%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB24Row_SSSE3                
 0.93%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope                          
 0.91%  libyuv_unittest  libc-2.19.so         [.] _int_malloc                         
 0.83%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2                      
 0.81%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2                      
 0.79%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2                
 0.77%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3                 
 0.76%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3             
 0.70%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86                       
 0.70%  libyuv_unittest  libyuv_unittest      [.] next_marker                         
 0.65%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C                    
 0.61%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2              
 0.57%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2                    

Original comment by fbarch...@google.com on 22 Dec 2015 at 10:46

GoogleCodeExporter commented 8 years ago
Overall amount of code for gcc/linux:
   5,354 compare_gcc.cc
  17,572 rotate_gcc.cc
 232,857 row_gcc.cc
  53,306 scale_gcc.cc
  309,089 bytes

Compared to visual c/windows:
   6,338 compare_win.cc
   7,360 rotate_win.cc
 203,639 row_win.cc
  42,070 scale_win.cc

Original comment by fbarch...@google.com on 4 Jan 2016 at 6:27

GoogleCodeExporter commented 8 years ago
Would be good to close this issue when there are no 'bottlenecks'.
To close this issue:

1. SIMD functions on Windows and GCC should be the same, aside from those that 
have compiler errors on GCC.  Historically there have been Windows functions 
not ported to GCC.

2. And there should be no "C" bottlenecks.  Which tends to be just bugs.

Ideally, all functions should be AVX2, but thats not linux specific.  More 
specific followup bugs can be created for specific bottlenecks.

Original comment by fbarch...@google.com on 4 Jan 2016 at 9:16

GoogleCodeExporter commented 8 years ago
fbarchard@fbarchard-linux:~/src/build/libyuv/libyuv$ LIBYUV_FLAGS=-1 
LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=1000 perf record 
out/Release/libyuv_unittest --gtest_filter=*ConvertTest*

 23.90%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  5.48%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
  5.15%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  4.24%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_AVX2
  3.84%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  3.72%  libyuv_unittest  libyuv_unittest      [.] I422AlphaToARGBRow_AVX2
  3.66%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  2.93%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  2.86%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
  2.52%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB24Row_SSSE3
  2.21%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  2.15%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3
  1.94%  libyuv_unittest  libyuv_unittest      [.] next_marker
  1.79%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
  1.70%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
  1.35%  libyuv_unittest  libyuv_unittest      [.] I422ToUYVYRow_SSE2
  1.34%  libyuv_unittest  libyuv_unittest      [.] I422ToYUY2Row_SSE2
  1.23%  libyuv_unittest  libyuv_unittest      [.] ARGB1555ToARGBRow_SSE2
  1.05%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB4444Row_SSE2
  0.98%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_AVX2
  0.98%  libyuv_unittest  libyuv_unittest      [.] RGB565ToARGBRow_SSE2
  0.97%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVJRow_SSSE3
  0.95%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3
  0.87%  libyuv_unittest  libyuv_unittest      [.] ARGB4444ToARGBRow_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] RGB24ToARGBRow_SSSE3
  0.85%  libyuv_unittest  libyuv_unittest      [.] RAWToARGBRow_SSSE3
  0.77%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
  0.70%  libyuv_unittest  libyuv_unittest      [.] SplitUVRow_AVX2
  0.68%  libyuv_unittest  libyuv_unittest      [.] NV21ToARGBRow_AVX2
  0.67%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_AVX2
  0.67%  libyuv_unittest  [kernel.kallsyms]    [k] 0xffffffff8104f45a
  0.65%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV444Row_SSSE3
  0.64%  libyuv_unittest  libyuv_unittest      [.] UYVYToARGBRow_AVX2
  0.64%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2
  0.62%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  0.57%  libyuv_unittest  libyuv_unittest      [.] ARGBMirrorRow_AVX2
  0.52%  libyuv_unittest  libyuv_unittest      [.] RAWToRGB24Row_SSSE3
  0.52%  libyuv_unittest  libyuv_unittest      [.] ARGBToRAWRow_SSSE3

Original comment by fbarch...@google.com on 21 Jan 2016 at 7:05

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=1000 perf 
record out/Release/libyuv_unittest --gtest_filter=*ConvertTest*
perf report

 24.12%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2         
  5.51%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2         
  5.10%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2        
  4.24%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_AVX2         
  3.82%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS               
  3.75%  libyuv_unittest  libyuv_unittest      [.] I422AlphaToARGBRow_AVX2    
  3.70%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3      
  2.90%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2            
  2.84%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3        
  2.80%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2           
  2.53%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB24Row_SSSE3       
  2.23%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2       
  2.16%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3        
  1.94%  libyuv_unittest  libyuv_unittest      [.] next_marker                
  1.80%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C           
  1.73%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2     
  1.33%  libyuv_unittest  libyuv_unittest      [.] I422ToYUY2Row_SSE2         
  1.32%  libyuv_unittest  libyuv_unittest      [.] I422ToUYVYRow_SSE2         
  1.23%  libyuv_unittest  libyuv_unittest      [.] ARGB1555ToARGBRow_SSE2     
  1.07%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB4444Row_SSE2     
  0.99%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_AVX2 
  0.98%  libyuv_unittest  libyuv_unittest      [.] RGB565ToARGBRow_SSE2       
  0.97%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3        
  0.95%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVJRow_SSSE3         
  0.87%  libyuv_unittest  libyuv_unittest      [.] ARGB4444ToARGBRow_SSE2     
  0.85%  libyuv_unittest  libyuv_unittest      [.] RAWToARGBRow_SSSE3         
  0.84%  libyuv_unittest  libyuv_unittest      [.] RGB24ToARGBRow_SSSE3       
  0.77%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2           
  0.69%  libyuv_unittest  libyuv_unittest      [.] SplitUVRow_AVX2            
  0.69%  libyuv_unittest  libyuv_unittest      [.] NV21ToARGBRow_AVX2         
  0.68%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_AVX2         
  0.65%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV444Row_SSSE3       
  0.65%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2         
  0.65%  libyuv_unittest  libyuv_unittest      [.] UYVYToARGBRow_AVX2         
  0.61%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2        
  0.57%  libyuv_unittest  libyuv_unittest      [.] ARGBMirrorRow_AVX2         
  0.52%  libyuv_unittest  libyuv_unittest      [.] RAWToRGB24Row_SSSE3        
  0.51%  libyuv_unittest  libyuv_unittest      [.] ARGBToRAWRow_SSSE3         

unexpected top bottlenecks:
I422ToARGBRow_SSSE3 due to I422ToARGBRow_AVX2
ARGBToUV411Row_C     
ARGBToUVJRow_SSSE3  due to ARGBToUVRow_AVX2
NV12ToARGBRow_SSSE3 due to I422ToARGBRow_AVX2     
I411ToARGBRow_SSSE3 due to I422ToARGBRow_AVX2    
I422ToYUY2Row_SSE2 - easy AVX2 port.

Original comment by fbarch...@google.com on 26 Jan 2016 at 1:20

GoogleCodeExporter commented 8 years ago

Original comment by fbarch...@google.com on 26 Jan 2016 at 1:49