myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

linux top bottlenecks #492

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Investigate top bottlenecks

LIBYUV_DISABLE_AVX2=1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=1000 
perf record out/Release/libyuv_unittest --gtest_filter=*
perf report

 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_ScaleTestRoundToByte_Test::T◆
 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBo▒
  4.94%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C                                  ▒
  4.07%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2                   ▒
  3.63%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_SSSE3                           ▒
  3.57%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBMatrixRow_SSSE3                      ▒
  3.06%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3                      ▒
  3.02%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3                          ▒
  2.63%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2                      ▒
  2.58%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleARGB(unsigned char const*, int, in▒
  2.57%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C                        ▒
  2.45%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2                             ▒
  2.44%  libyuv_unittest  libc-2.19.so         [.] __random_r                                     ▒
  2.23%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS                                   ▒
  1.64%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRMatrixRow_SSSE3                      ▒
  1.46%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2                      ▒
  1.29%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86                                   ▒
  1.26%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, uns▒
  1.24%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_SSSE3                           ▒
  1.21%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2                 ▒
  1.14%  libyuv_unittest  libc-2.19.so         [.] __random                                       ▒
  1.08%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C                                    ▒
  0.99%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_SSSE3                               ▒
  0.75%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2                   ▒
  0.75%  libyuv_unittest  libc-2.19.so         [.] _int_malloc       

Original issue reported on code.google.com by fbarch...@google.com on 16 Sep 2015 at 11:36

GoogleCodeExporter commented 8 years ago
r1483 removes redundent scale rounding test.

Rounding test is still top bottleneck though on linux.

 16.52%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()

Original comment by fbarch...@google.com on 17 Sep 2015 at 5:28

GoogleCodeExporter commented 8 years ago
The following is a complete list of C functions (there should be none)

LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*
perf report >out.txt
grep _C out.txt

     5.88%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     3.08%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     1.38%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, unsigned short const*, unsigned char*)
     1.28%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     0.52%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
     0.25%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_C
     0.14%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols2_C(int, int, int, int, unsigned short const*, unsigned char*)
     0.07%  libyuv_unittest  libyuv_unittest      [.] ScaleColsUp2_C
     0.03%  libyuv_unittest  libyuv_unittest      [.] MirrorUVRow_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWxH_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_0_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_1_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] TransposeUVWx8_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown2Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_2_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_CropNV12_Test::TestBody()
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVJ422Row_C

Original comment by fbarch...@google.com on 17 Sep 2015 at 6:35

GoogleCodeExporter commented 8 years ago
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*

    18.31%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.47%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.05%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.81%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.43%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.08%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.86%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.72%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.60%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.48%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.47%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.30%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.08%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 23 Sep 2015 at 8:27

GoogleCodeExporter commented 8 years ago
NV12ToARGB optimized
    18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.50%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.16%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.83%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.92%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.75%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.61%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.48%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.26%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     0.93%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     0.92%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     0.83%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
     0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
     0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
     0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
     0.61%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
     0.57%  libyuv_unittest  libyuv_unittest      [.] next_marker
     0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 25 Sep 2015 at 7:31

GoogleCodeExporter commented 8 years ago
NV12 AVX2
 18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
  6.53%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
  5.08%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  4.84%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  3.12%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  2.90%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  2.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
  2.71%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.38%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  1.76%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  1.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  1.49%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.41%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
  0.99%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  0.92%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  0.87%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  0.84%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.62%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.55%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3
  0.54%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
  0.50%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
  0.48%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
  0.47%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
  0.46%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
  0.45%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
  0.43%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
  0.42%  libyuv_unittest  libyuv_unittest      [.] I422ToBGRARow_AVX2
  0.41%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
  0.40%  libyuv_unittest  libc-2.19.so         [.] _int_free
  0.40%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_AVX2

Original comment by fbarch...@google.com on 25 Sep 2015 at 11:57