Open GoogleCodeExporter opened 8 years ago
r1366 changes sse2 to allow height = 1.
set LIBYUV_WIDTH=1920
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest.exe --gtest_filter=*.ScaleTo640* | findstr ms
Was
ScaleTo640x360_None (245 ms)
ScaleTo640x360_Linear (225 ms)
ScaleTo640x360_Bilinear (201 ms)
ScaleTo640x360_Box (1476 ms)
Now
ScaleTo640x360_None (255 ms)
ScaleTo640x360_Linear (244 ms)
ScaleTo640x360_Bilinear (202 ms)
ScaleTo640x360_Box (1460 ms)
Original comment by fbarch...@chromium.org
on 13 Apr 2015 at 6:57
r1367 adds AVX2 box filter
For 640x3600 to 640x360:
Was SSE2
[ RUN ] libyuvTest.ScaleTo640x360_Box
filter 3 - 5101 us C - 1003 us OPT
[ OK ] libyuvTest.ScaleTo640x360_Box (1063 ms)
Now AVX2
[ RUN ] libyuvTest.ScaleTo640x360_Box
filter 3 - 4224 us C - 823 us OPT
[ OK ] libyuvTest.ScaleTo640x360_Box (875 ms)
Original comment by fbarch...@chromium.org
on 14 Apr 2015 at 12:49
set LIBYUV_WIDTH=1900
out\release\libyuv_unittest.exe
[ PASSED ] 785 tests.
[ FAILED ] 14 tests, listed below:
[ FAILED ] libyuvTest.ARGBScaleClipTo320x240_Box
[ FAILED ] libyuvTest.ARGBScaleClipFrom320x240_Box
[ FAILED ] libyuvTest.ARGBScaleTo352x288_Box
[ FAILED ] libyuvTest.ARGBScaleClipFrom352x288_Box
[ FAILED ] libyuvTest.ARGBScaleClipTo569x480_Box
[ FAILED ] libyuvTest.ARGBScaleClipFrom569x480_Box
[ FAILED ] libyuvTest.ARGBScaleClipTo640x360_Box
[ FAILED ] libyuvTest.ARGBScaleClipFrom640x360_Box
[ FAILED ] libyuvTest.ARGBScaleClipFrom1280x720_Box
[ FAILED ] libyuvTest.ScaleFrom320x240_Box
[ FAILED ] libyuvTest.ScaleFrom352x288_Box
[ FAILED ] libyuvTest.ScaleFrom569x480_Box
[ FAILED ] libyuvTest.ScaleFrom640x360_Box
[ FAILED ] libyuvTest.ScaleFrom1280x720_Box
14 FAILED TESTS
Original comment by fbarch...@google.com
on 14 Apr 2015 at 10:41
box filter code does not support source box width/height of less than 1
previously box filter was avoided for up sampling.
this was recently removed because down sampling height, while keeping width
same, was switching to bilinear.
consider reintroducing the switch to bilinear, but only if the width goes up,
not stays the same. and/or height.
its unknown by clip fails, but I would guess the destination is small and the
source for upsampling is less than 1 pixel.
Original comment by fbarch...@chromium.org
on 16 Apr 2015 at 7:51
Box filter is slow for odd width. This is due to memory reading columns
set LIBYUV_WIDTH=1918
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\debug\libyuv_unittest.exe --gtest_filter=*ScaleTo1x1_Box | findstr /r
"^[^_]*_[^_]*ms"
ScaleTo1x1_Box (805 ms)
set LIBYUV_WIDTH=1920
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\debug\libyuv_unittest.exe --gtest_filter=*ScaleTo1x1_Box | findstr /r
"^[^_]*_[^_]*ms"
ScaleTo1x1_Box (356 ms)
suggest a row oriented function.
Original comment by fbarch...@chromium.org
on 2 Jun 2015 at 1:31
LIBYUV_WIDTH=1920 LIBYUV_HEIGHT=1080 LIBYUV_REPEAT=999 perf record
out/Release/libyuv_unittest --gtest_filter=*ScaleTo640x360_Box*
64.98% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
31.81% libyuv_unittest libyuv_unittest [.] ScaleAddCols1_C
2.19% libyuv_unittest libc-2.19.so [.] memset
0.64% libyuv_unittest libyuv_unittest [.] ScalePlane
0.19% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff8104f45a
0.09% libyuv_unittest libyuv_unittest [.] libyuv::TestFilter(int, int, int, int, libyuv::FilterMode, int, int)
Note memset is called once per row to clear accumulation buffer of ScaleAddRow_C
Original comment by fbarch...@google.com
on 22 Sep 2015 at 10:58
Intel profile:
Samples: 2K of event 'cycles', Event count (approx.): 2669930815
72.96% libyuv_unittest libyuv_unittest [.] ScaleAddCols1_C
20.23% libyuv_unittest libyuv_unittest [.] ScaleAddRow_AVX2
4.95% libyuv_unittest libc-2.19.so [.] memset
0.76% libyuv_unittest libyuv_unittest [.] ScalePlane
0.50% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff8104f45a
0.21% libyuv_unittest libyuv_unittest [.] libyuv::TestFilter(int, int, int, int, libyuv::FilterMode, int, int, int)
0.13% libyuv_unittest libyuv_unittest [.] ScaleAddRow_C
0.07% libyuv_unittest libc-2.19.so [.] _int_malloc
0.04% libyuv_unittest libyuv_unittest [.] memset@plt
0.03% libyuv_unittest libyuv_unittest [.] I420Scale
0.02% libyuv_unittest libc-2.19.so [.] __memcpy_sse2_unaligned
shows memset is a bit high on profile. It could be avoided: memset is called
in 2 places. Once per row for the accumulation buffer, and for odd widths,
once per row to clear the simd buffers.
Benchmark on Arm, where AddRow is C
util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose
--release --gtest_filter=*ScaleDownBy?_* -a "--libyuv_width=1280
--libyuv_height=720 --libyuv_repea
t=999 --libyuv_flags=-1" | grep ms | sed 's/\(.*(\)\([0-9]*\)\( ms)\)/\2 -
\1\2\3/g' | sort -rn | sed 's/.*- \(.*\)/\1/g'
I 521.236s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy8_Box (219165 ms)
I 521.237s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy3_Box (49810 ms)
I 521.232s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy4_Box (30018 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy8_Bilinear (18233 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy8_Box (18164 ms)
I 521.232s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy8_Linear (15275 ms)
I 521.232s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy4_Bilinear (11854 ms)
I 521.236s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy8_Bilinear (11296 ms)
I 521.232s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy4_Linear (9126 ms)
I 521.235s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy4_Bilinear (8679 ms)
I 521.232s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy8_None (7800 ms)
I 521.235s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy8_Linear (7134 ms)
I 521.235s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy4_Linear (6575 ms)
I 521.231s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy2_Box (6391 ms)
I 521.231s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy2_Bilinear (6250 ms)
I 521.235s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy4_Box (5943 ms)
I 521.235s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy8_None (5066 ms)
I 521.231s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy4_None (4888 ms)
I 521.236s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy3_None (4863 ms)
I 521.236s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy3_Linear (4677 ms)
I 521.231s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy2_Linear (4228 ms)
I 521.236s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy3_Bilinear (4017 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy3_None (3690 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy3_Bilinear (3674 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy3_Linear (3669 ms)
I 521.233s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy3_Box (3654 ms)
I 521.231s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ARGBScaleDownBy2_None (2618 ms)
I 521.234s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy2_Bilinear (2576 ms)
I 521.234s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy2_Box (2562 ms)
I 521.234s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy4_None (2123 ms)
I 521.234s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy2_Linear (1843 ms)
I 521.234s run_tests_on_device(HT4A2JT03762) [ OK ]
LibYUVScaleTest.ScaleDownBy2_None (1111 ms)
I 521.237s run_tests_on_device(HT4A2JT03762) [----------] 32 tests from
LibYUVScaleTest (486988 ms total)
I 521.237s run_tests_on_device(HT4A2JT03762) [==========] 32 tests from 1
test case ran. (486989 ms total)
Original comment by fbarch...@chromium.org
on 16 Nov 2015 at 11:55
Original issue reported on code.google.com by
fbarch...@chromium.org
on 13 Apr 2015 at 6:25