Open GoogleCodeExporter opened 9 years ago
r1483 removes redundent scale rounding test.
Rounding test is still top bottleneck though on linux.
16.52% libyuv_unittest libyuv_unittest [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
Original comment by fbarch...@google.com
on 17 Sep 2015 at 5:28
The following is a complete list of C functions (there should be none)
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf
record out/Release/libyuv_unittest --gtest_filter=*
perf report >out.txt
grep _C out.txt
5.88% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
3.08% libyuv_unittest libyuv_unittest [.] ARGBToRGB565DitherRow_C
1.38% libyuv_unittest libyuv_unittest [.] libyuv::ScaleAddCols1_C(int, int, int, int, unsigned short const*, unsigned char*)
1.28% libyuv_unittest libyuv_unittest [.] ScaleCols_C
0.52% libyuv_unittest libyuv_unittest [.] ARGBToUV411Row_C
0.25% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEven_C
0.14% libyuv_unittest libyuv_unittest [.] libyuv::ScaleAddCols2_C(int, int, int, int, unsigned short const*, unsigned char*)
0.07% libyuv_unittest libyuv_unittest [.] ScaleColsUp2_C
0.03% libyuv_unittest libyuv_unittest [.] MirrorUVRow_C
0.01% libyuv_unittest libyuv_unittest [.] TransposeWx8_C
0.01% libyuv_unittest libyuv_unittest [.] TransposeWxH_C
0.01% libyuv_unittest libyuv_unittest [.] ScaleRowDown34_0_Box_C
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown34_1_Box_C
0.00% libyuv_unittest libyuv_unittest [.] TransposeUVWx8_C
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown38_3_Box_C
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown2Box_C
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown34_C
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown38_2_Box_C
0.00% libyuv_unittest libyuv_unittest [.] libyuv::libyuvTest_CropNV12_Test::TestBody()
0.00% libyuv_unittest libyuv_unittest [.] ScaleRowDown38_C
0.00% libyuv_unittest libyuv_unittest [.] ARGBToUVJ422Row_C
Original comment by fbarch...@google.com
on 17 Sep 2015 at 6:35
LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf
record out/Release/libyuv_unittest --gtest_filter=*
18.31% libyuv_unittest libyuv_unittest [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
6.47% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
5.05% libyuv_unittest libyuv_unittest [.] InterpolateRow_AVX2
4.81% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEvenBox_SSE2
3.64% libyuv_unittest libyuv_unittest [.] ScaleFilterCols_SSSE3
3.43% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
3.08% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDown2Box_SSE2
3.00% libyuv_unittest libyuv_unittest [.] ScaleARGB
2.86% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
2.83% libyuv_unittest libyuv_unittest [.] ARGBToRGB565DitherRow_C
2.69% libyuv_unittest libyuv_unittest [.] CopyRow_ERMS
2.59% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_AVX2
1.72% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEven_SSE2
1.60% libyuv_unittest libyuv_unittest [.] FixedDiv_X86
1.48% libyuv_unittest libyuv_unittest [.] ARGBShuffleRow_AVX2
1.47% libyuv_unittest libyuv_unittest [.] CumulativeSumToAverageRow_SSE2
1.45% libyuv_unittest libyuv_unittest [.] I422ToABGRRow_AVX2
1.40% libyuv_unittest libyuv_unittest [.] ScaleCols_C
1.30% libyuv_unittest libyuv_unittest [.] ScaleAddCols1_C
1.08% libyuv_unittest libyuv_unittest [.] NV12ToARGBRow_SSSE3
Original comment by fbarch...@google.com
on 23 Sep 2015 at 8:27
NV12ToARGB optimized
18.25% libyuv_unittest libyuv_unittest [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
6.50% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
5.16% libyuv_unittest libyuv_unittest [.] InterpolateRow_AVX2
4.83% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEvenBox_SSE2
3.64% libyuv_unittest libyuv_unittest [.] ScaleFilterCols_SSSE3
3.42% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
3.15% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDown2Box_SSE2
3.00% libyuv_unittest libyuv_unittest [.] ScaleARGB
2.92% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
2.83% libyuv_unittest libyuv_unittest [.] ARGBToRGB565DitherRow_C
2.69% libyuv_unittest libyuv_unittest [.] CopyRow_ERMS
2.59% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_AVX2
1.75% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEven_SSE2
1.61% libyuv_unittest libyuv_unittest [.] FixedDiv_X86
1.49% libyuv_unittest libyuv_unittest [.] ARGBShuffleRow_AVX2
1.48% libyuv_unittest libyuv_unittest [.] CumulativeSumToAverageRow_SSE2
1.45% libyuv_unittest libyuv_unittest [.] I422ToABGRRow_AVX2
1.40% libyuv_unittest libyuv_unittest [.] ScaleCols_C
1.26% libyuv_unittest libyuv_unittest [.] ScaleAddCols1_C
0.93% libyuv_unittest libyuv_unittest [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.92% libyuv_unittest libc-2.19.so [.] _int_malloc
0.91% libyuv_unittest libyuv_unittest [.] ComputeCumulativeSumRow_SSE2
0.85% libyuv_unittest libyuv_unittest [.] ARGBToRGB565Row_SSE2
0.85% libyuv_unittest libyuv_unittest [.] ARGBToYRow_AVX2
0.83% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_SSSE3
0.68% libyuv_unittest libyuv_unittest [.] SobelXRow_SSE2
0.67% libyuv_unittest libyuv_unittest [.] SobelYRow_SSE2
0.62% libyuv_unittest libyuv_unittest [.] TransposeWx8_Fast_SSSE3
0.62% libyuv_unittest libyuv_unittest [.] FixedDiv1_X86
0.61% libyuv_unittest libyuv_unittest [.] ScaleSlope
0.57% libyuv_unittest libyuv_unittest [.] next_marker
0.54% libyuv_unittest libyuv_unittest [.] NV12ToARGBRow_SSSE3
Original comment by fbarch...@google.com
on 25 Sep 2015 at 7:31
NV12 AVX2
18.25% libyuv_unittest libyuv_unittest [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
6.53% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
5.08% libyuv_unittest libyuv_unittest [.] InterpolateRow_AVX2
4.84% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEvenBox_SSE2
3.64% libyuv_unittest libyuv_unittest [.] ScaleFilterCols_SSSE3
3.42% libyuv_unittest libyuv_unittest [.] ScaleARGBFilterCols_SSSE3
3.12% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDown2Box_SSE2
3.00% libyuv_unittest libyuv_unittest [.] ScaleARGB
2.90% libyuv_unittest libyuv_unittest [.] ScaleARGBCols_SSE2
2.85% libyuv_unittest libyuv_unittest [.] ARGBToRGB565DitherRow_C
2.71% libyuv_unittest libyuv_unittest [.] CopyRow_ERMS
2.38% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_AVX2
1.76% libyuv_unittest libyuv_unittest [.] ScaleARGBRowDownEven_SSE2
1.62% libyuv_unittest libyuv_unittest [.] FixedDiv_X86
1.49% libyuv_unittest libyuv_unittest [.] CumulativeSumToAverageRow_SSE2
1.49% libyuv_unittest libyuv_unittest [.] ARGBShuffleRow_AVX2
1.41% libyuv_unittest libyuv_unittest [.] ScaleCols_C
1.25% libyuv_unittest libyuv_unittest [.] ScaleAddCols1_C
1.25% libyuv_unittest libyuv_unittest [.] I422ToABGRRow_AVX2
0.99% libyuv_unittest libc-2.19.so [.] _int_malloc
0.92% libyuv_unittest libyuv_unittest [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
0.91% libyuv_unittest libyuv_unittest [.] ComputeCumulativeSumRow_SSE2
0.87% libyuv_unittest libyuv_unittest [.] ARGBToRGB565Row_SSE2
0.85% libyuv_unittest libyuv_unittest [.] ARGBToYRow_AVX2
0.84% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_SSSE3
0.68% libyuv_unittest libyuv_unittest [.] SobelXRow_SSE2
0.67% libyuv_unittest libyuv_unittest [.] SobelYRow_SSE2
0.62% libyuv_unittest libyuv_unittest [.] TransposeWx8_Fast_SSSE3
0.62% libyuv_unittest libyuv_unittest [.] ScaleSlope
0.62% libyuv_unittest libyuv_unittest [.] FixedDiv1_X86
0.55% libyuv_unittest libyuv_unittest [.] next_marker
0.54% libyuv_unittest libyuv_unittest [.] NV12ToARGBRow_SSSE3
0.54% libyuv_unittest libyuv_unittest [.] ARGBToUV411Row_C
0.50% libyuv_unittest libyuv_unittest [.] ARGBToARGB1555Row_SSE2
0.48% libyuv_unittest libyuv_unittest [.] ARGBScaleClip
0.47% libyuv_unittest libyuv_unittest [.] ARGBToUVRow_AVX2
0.46% libyuv_unittest libyuv_unittest [.] ARGBToYJRow_AVX2
0.45% libyuv_unittest libyuv_unittest [.] InterpolateRow_Any_AVX2
0.43% libyuv_unittest libyuv_unittest [.] ARGBToUV422Row_SSSE3
0.42% libyuv_unittest libyuv_unittest [.] I422ToBGRARow_AVX2
0.41% libyuv_unittest libyuv_unittest [.] I422ToRGBARow_AVX2
0.40% libyuv_unittest libc-2.19.so [.] _int_free
0.40% libyuv_unittest libyuv_unittest [.] NV12ToARGBRow_AVX2
Original comment by fbarch...@google.com
on 25 Sep 2015 at 11:57
Original issue reported on code.google.com by
fbarch...@google.com
on 16 Sep 2015 at 11:36