skufog / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Fast bilinear filtering for arbitrary ARGB scale #208

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Nice to have (for upsampling to arbitrary output sizes). 

Original issue reported on code.google.com by noah...@google.com on 23 Mar 2013 at 12:39

GoogleCodeExporter commented 9 years ago
3 notes

1. arbitrary is about 2x slower than specialized.  If the source and 
destination sizes are known, you can always achieve faster results, and with 
less effort.

2. the current approach for scaler is geared to down sampling.  Its 2 pass, and 
scales rows, then columns.  For upsampling, it would be better to do columns 
then rows.

3. the column sampler is C.  It should be ported to SIMD.

Original comment by fbarch...@chromium.org on 25 Mar 2013 at 9:53

GoogleCodeExporter commented 9 years ago
Adding a test for scaling from 640x360, and running it like this:
set LIBYUV_WIDTH=1536
set LIBYUV_HEIGHT=929
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*ScaleFrom*

Performance on Z620 (Sandy Bridge Xeon):
[----------] 4 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -      750 us C -      748 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (780 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    11396 us C -     9356 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (9418 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_None
filter 0 -     1971 us C -     1859 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_None (1875 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 -     3895 us C -     2852 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_Bilinear (2873 ms)
[----------] 4 tests from libyuvTest (14951 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (14955 ms total)
[  PASSED  ] 4 tests.

Original comment by fbarch...@chromium.org on 25 Mar 2013 at 10:13

GoogleCodeExporter commented 9 years ago
d:\src\libyuv2\trunk>out\release\libyuv_unittest --gtest_filter=*Scale*640*
Note: Google Test filter = *Scale*640*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -     2785 us C -     2743 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (2810 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    35813 us C -    33224 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (33401 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_None
filter 0 -     7151 us C -     6680 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_None (6723 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 -    11743 us C -     9991 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_Bilinear (10050 ms)
[----------] 4 tests from libyuvTest (52990 ms total)

Original comment by fbarch...@chromium.org on 5 Apr 2013 at 8:52

GoogleCodeExporter commented 9 years ago
r645 uses kMaxStride which is retina (2880 * 4)
Current C column code is 109 instructions per pixel using imul.

Original comment by fbarch...@chromium.org on 5 Apr 2013 at 6:24

GoogleCodeExporter commented 9 years ago
r647 is first pass at SSSE3 bilinear columns function.

set LIBYUV_WIDTH=2880
set LIBYUV_HEIGHT=1800
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*Scale*640*

[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    36655 us C -     9901 us OPT

Original comment by fbarch...@chromium.org on 6 Apr 2013 at 8:08

GoogleCodeExporter commented 9 years ago
Linux version will need a port:

LIBYUV_WIDTH=2880 LIBYUV_HEIGHT=1800 LIBYUV_REPEAT=1000 
out/Release/libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -     3426 us C -     2923 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (2999 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    27884 us C -    25615 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (25748 ms)
[----------] 2 tests from libyuvTest (28747 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (28748 ms total)
[  PASSED  ] 2 tests.

Original comment by fbarch...@chromium.org on 7 Apr 2013 at 6:58

GoogleCodeExporter commented 9 years ago
r650 optimizes bilinear upsampling ARGB scale.

Windows
out\release\libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -     2646 us C -     2286 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (2352 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    36551 us C -     4548 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (4687 ms)
[----------] 2 tests from libyuvTest (7041 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (7043 ms total)
[  PASSED  ] 2 tests.

Linux
LIBYUV_WIDTH=2880 LIBYUV_HEIGHT=1800 LIBYUV_REPEAT=1000 
out/Release/libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -     3384 us C -     2282 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (2353 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -    28949 us C -     4420 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (4536 ms)
[----------] 2 tests from libyuvTest (6890 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (6890 ms total)
[  PASSED  ] 2 tests.

Original comment by fbarch...@chromium.org on 8 Apr 2013 at 11:40

GoogleCodeExporter commented 9 years ago
valgrind ASAN doesn't like this:

source/scale_argb.cc:1017:6: error: invalid operand for instruction
    "pextrw    $0x1,%%xmm2,%3                  \n"
     ^
<inline asm>:5:1: note: instantiated into assembly here
pextrw    $0x1,%xmm2,%rcx     

Original comment by fbarch...@chromium.org on 9 Apr 2013 at 10:59

GoogleCodeExporter commented 9 years ago
%k3 should work around it.  But pextrw is 9 cycles??

Original comment by fbarch...@chromium.org on 10 Apr 2013 at 12:46

GoogleCodeExporter commented 9 years ago
TODO - arm
TODO - 10x.  7x improvement is mediocre.

For arm theres no pair multiply accumulate. maybe vmla
For 10x, scale columns first then rows.

Original comment by fbarch...@google.com on 10 Apr 2013 at 9:01

GoogleCodeExporter commented 9 years ago
r665 is 10.1x faster, using a bilinear upsampler:
Was 9356 us OPT
Now 926 us OPT

set LIBYUV_WIDTH=1536
set LIBYUV_HEIGHT=929
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*ScaleFrom*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 -      730 us C -      633 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_None (669 ms)
[ RUN      ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 -     7489 us C -      926 us OPT
[       OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (977 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_None
filter 0 -     1964 us C -     1837 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_None (1854 ms)
[ RUN      ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 -     3910 us C -     2833 us OPT
[       OK ] libyuvTest.ScaleFrom640x360_Bilinear (2854 ms)
[----------] 4 tests from libyuvTest (6358 ms total)

TODO - arm.

Original comment by fbarch...@google.com on 15 Apr 2013 at 9:07

GoogleCodeExporter commented 9 years ago
Arm performance r643:

chronos@localhost $ sudo LIBYUV_WIDTH=1536 LIBYUV_HEIGHT=929 LIBYUV_REPEAT=1000 
nice --5 ./libyuv_unittest --gtest_filter=*ARGBScaleFrom640* | grep us
filter 0 -     3046 us C -     2827 us OPT
filter 1 -    31028 us C -    27533 us OPT

Original comment by fbarch...@google.com on 15 Apr 2013 at 9:44

GoogleCodeExporter commented 9 years ago
r665 has upsampler and Neon Row filter

chronos@localhost $ sudo LIBYUV_WIDTH=1536 LIBYUV_HEIGHT=929 LIBYUV_REPEAT=1000 
nice --5 ./libyuv_unittest --gtest_filter=*ARGBScaleFrom640* | grep us
filter 0 -     2862 us C -     2817 us OPT
filter 1 -    17349 us C -     8713 us OPT

TODO - Neon optimized column filter

Original comment by fbarch...@google.com on 15 Apr 2013 at 9:50

GoogleCodeExporter commented 9 years ago
deferring neon.  will need column x86 for yuv as well.

Original comment by fbarch...@google.com on 19 Jun 2013 at 8:41