myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

BoxFilter performance #425

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
scale functions with box filter should
1. be optimized for avx2
2. support odd widths
3. support heights of 1 without falling back on c

also consider row at a time instead of columns.

Original issue reported on code.google.com by fbarch...@chromium.org on 13 Apr 2015 at 6:25

GoogleCodeExporter commented 9 years ago
r1366 changes sse2 to allow height = 1.
set LIBYUV_WIDTH=1920
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest.exe --gtest_filter=*.ScaleTo640* | findstr ms
Was
ScaleTo640x360_None (245 ms)
ScaleTo640x360_Linear (225 ms)
ScaleTo640x360_Bilinear (201 ms)
ScaleTo640x360_Box (1476 ms)

Now
ScaleTo640x360_None (255 ms)
ScaleTo640x360_Linear (244 ms)
ScaleTo640x360_Bilinear (202 ms)
ScaleTo640x360_Box (1460 ms)

Original comment by fbarch...@chromium.org on 13 Apr 2015 at 6:57

GoogleCodeExporter commented 9 years ago
r1367 adds AVX2 box filter
For 640x3600 to 640x360:

Was SSE2
[ RUN      ] libyuvTest.ScaleTo640x360_Box
filter 3 -     5101 us C -     1003 us OPT
[       OK ] libyuvTest.ScaleTo640x360_Box (1063 ms)

Now AVX2
[ RUN      ] libyuvTest.ScaleTo640x360_Box
filter 3 -     4224 us C -      823 us OPT
[       OK ] libyuvTest.ScaleTo640x360_Box (875 ms)

Original comment by fbarch...@chromium.org on 14 Apr 2015 at 12:49

GoogleCodeExporter commented 9 years ago
set LIBYUV_WIDTH=1900

out\release\libyuv_unittest.exe

[  PASSED  ] 785 tests.
[  FAILED  ] 14 tests, listed below:
[  FAILED  ] libyuvTest.ARGBScaleClipTo320x240_Box
[  FAILED  ] libyuvTest.ARGBScaleClipFrom320x240_Box
[  FAILED  ] libyuvTest.ARGBScaleTo352x288_Box
[  FAILED  ] libyuvTest.ARGBScaleClipFrom352x288_Box
[  FAILED  ] libyuvTest.ARGBScaleClipTo569x480_Box
[  FAILED  ] libyuvTest.ARGBScaleClipFrom569x480_Box
[  FAILED  ] libyuvTest.ARGBScaleClipTo640x360_Box
[  FAILED  ] libyuvTest.ARGBScaleClipFrom640x360_Box
[  FAILED  ] libyuvTest.ARGBScaleClipFrom1280x720_Box
[  FAILED  ] libyuvTest.ScaleFrom320x240_Box
[  FAILED  ] libyuvTest.ScaleFrom352x288_Box
[  FAILED  ] libyuvTest.ScaleFrom569x480_Box
[  FAILED  ] libyuvTest.ScaleFrom640x360_Box
[  FAILED  ] libyuvTest.ScaleFrom1280x720_Box

14 FAILED TESTS

Original comment by fbarch...@google.com on 14 Apr 2015 at 10:41

GoogleCodeExporter commented 9 years ago
box filter code does not support source box width/height of less than 1
previously box filter was avoided for up sampling.
this was recently removed because down sampling height, while keeping width 
same, was switching to bilinear.
consider reintroducing the switch to bilinear, but only if the width goes up, 
not stays the same.  and/or height.

its unknown by clip fails, but I would guess the destination is small and the 
source for upsampling is less than 1 pixel.

Original comment by fbarch...@chromium.org on 16 Apr 2015 at 7:51

GoogleCodeExporter commented 9 years ago
Box filter is slow for odd width.  This is due to memory reading columns

set LIBYUV_WIDTH=1918
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1

out\debug\libyuv_unittest.exe --gtest_filter=*ScaleTo1x1_Box   | findstr /r 
"^[^_]*_[^_]*ms"
ScaleTo1x1_Box (805 ms)

set LIBYUV_WIDTH=1920
set LIBYUV_HEIGHT=1080
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1

out\debug\libyuv_unittest.exe --gtest_filter=*ScaleTo1x1_Box   | findstr /r 
"^[^_]*_[^_]*ms"
ScaleTo1x1_Box (356 ms)

suggest a row oriented function.

Original comment by fbarch...@chromium.org on 2 Jun 2015 at 1:31

GoogleCodeExporter commented 9 years ago
LIBYUV_WIDTH=1920 LIBYUV_HEIGHT=1080 LIBYUV_REPEAT=999 perf record 
out/Release/libyuv_unittest --gtest_filter=*ScaleTo640x360_Box*

64.98%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
31.81%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
 2.19%  libyuv_unittest  libc-2.19.so         [.] memset
 0.64%  libyuv_unittest  libyuv_unittest      [.] ScalePlane
 0.19%  libyuv_unittest  [kernel.kallsyms]    [k] 0xffffffff8104f45a
 0.09%  libyuv_unittest  libyuv_unittest      [.] libyuv::TestFilter(int, int, int, int, libyuv::FilterMode, int, int)

Note memset is called once per row to clear accumulation buffer of ScaleAddRow_C

Original comment by fbarch...@google.com on 22 Sep 2015 at 10:58