Closed GoogleCodeExporter closed 9 years ago
3 notes
1. arbitrary is about 2x slower than specialized. If the source and
destination sizes are known, you can always achieve faster results, and with
less effort.
2. the current approach for scaler is geared to down sampling. Its 2 pass, and
scales rows, then columns. For upsampling, it would be better to do columns
then rows.
3. the column sampler is C. It should be ported to SIMD.
Original comment by fbarch...@chromium.org
on 25 Mar 2013 at 9:53
Adding a test for scaling from 640x360, and running it like this:
set LIBYUV_WIDTH=1536
set LIBYUV_HEIGHT=929
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*ScaleFrom*
Performance on Z620 (Sandy Bridge Xeon):
[----------] 4 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 750 us C - 748 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (780 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 11396 us C - 9356 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (9418 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_None
filter 0 - 1971 us C - 1859 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_None (1875 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 - 3895 us C - 2852 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_Bilinear (2873 ms)
[----------] 4 tests from libyuvTest (14951 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (14955 ms total)
[ PASSED ] 4 tests.
Original comment by fbarch...@chromium.org
on 25 Mar 2013 at 10:13
d:\src\libyuv2\trunk>out\release\libyuv_unittest --gtest_filter=*Scale*640*
Note: Google Test filter = *Scale*640*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 2785 us C - 2743 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (2810 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 35813 us C - 33224 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (33401 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_None
filter 0 - 7151 us C - 6680 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_None (6723 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 - 11743 us C - 9991 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_Bilinear (10050 ms)
[----------] 4 tests from libyuvTest (52990 ms total)
Original comment by fbarch...@chromium.org
on 5 Apr 2013 at 8:52
r645 uses kMaxStride which is retina (2880 * 4)
Current C column code is 109 instructions per pixel using imul.
Original comment by fbarch...@chromium.org
on 5 Apr 2013 at 6:24
r647 is first pass at SSSE3 bilinear columns function.
set LIBYUV_WIDTH=2880
set LIBYUV_HEIGHT=1800
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*Scale*640*
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 36655 us C - 9901 us OPT
Original comment by fbarch...@chromium.org
on 6 Apr 2013 at 8:08
Linux version will need a port:
LIBYUV_WIDTH=2880 LIBYUV_HEIGHT=1800 LIBYUV_REPEAT=1000
out/Release/libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 3426 us C - 2923 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (2999 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 27884 us C - 25615 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (25748 ms)
[----------] 2 tests from libyuvTest (28747 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (28748 ms total)
[ PASSED ] 2 tests.
Original comment by fbarch...@chromium.org
on 7 Apr 2013 at 6:58
r650 optimizes bilinear upsampling ARGB scale.
Windows
out\release\libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 2646 us C - 2286 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (2352 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 36551 us C - 4548 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (4687 ms)
[----------] 2 tests from libyuvTest (7041 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (7043 ms total)
[ PASSED ] 2 tests.
Linux
LIBYUV_WIDTH=2880 LIBYUV_HEIGHT=1800 LIBYUV_REPEAT=1000
out/Release/libyuv_unittest --gtest_filter=*ARGBScale*640*
Note: Google Test filter = *ARGBScale*640*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 3384 us C - 2282 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (2353 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 28949 us C - 4420 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (4536 ms)
[----------] 2 tests from libyuvTest (6890 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (6890 ms total)
[ PASSED ] 2 tests.
Original comment by fbarch...@chromium.org
on 8 Apr 2013 at 11:40
valgrind ASAN doesn't like this:
source/scale_argb.cc:1017:6: error: invalid operand for instruction
"pextrw $0x1,%%xmm2,%3 \n"
^
<inline asm>:5:1: note: instantiated into assembly here
pextrw $0x1,%xmm2,%rcx
Original comment by fbarch...@chromium.org
on 9 Apr 2013 at 10:59
%k3 should work around it. But pextrw is 9 cycles??
Original comment by fbarch...@chromium.org
on 10 Apr 2013 at 12:46
TODO - arm
TODO - 10x. 7x improvement is mediocre.
For arm theres no pair multiply accumulate. maybe vmla
For 10x, scale columns first then rows.
Original comment by fbarch...@google.com
on 10 Apr 2013 at 9:01
r665 is 10.1x faster, using a bilinear upsampler:
Was 9356 us OPT
Now 926 us OPT
set LIBYUV_WIDTH=1536
set LIBYUV_HEIGHT=929
set LIBYUV_REPEAT=1000
out\release\libyuv_unittest --gtest_filter=*ScaleFrom*
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from libyuvTest
[ RUN ] libyuvTest.ARGBScaleFrom640x360_None
filter 0 - 730 us C - 633 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_None (669 ms)
[ RUN ] libyuvTest.ARGBScaleFrom640x360_Bilinear
filter 1 - 7489 us C - 926 us OPT
[ OK ] libyuvTest.ARGBScaleFrom640x360_Bilinear (977 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_None
filter 0 - 1964 us C - 1837 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_None (1854 ms)
[ RUN ] libyuvTest.ScaleFrom640x360_Bilinear
filter 1 - 3910 us C - 2833 us OPT
[ OK ] libyuvTest.ScaleFrom640x360_Bilinear (2854 ms)
[----------] 4 tests from libyuvTest (6358 ms total)
TODO - arm.
Original comment by fbarch...@google.com
on 15 Apr 2013 at 9:07
Arm performance r643:
chronos@localhost $ sudo LIBYUV_WIDTH=1536 LIBYUV_HEIGHT=929 LIBYUV_REPEAT=1000
nice --5 ./libyuv_unittest --gtest_filter=*ARGBScaleFrom640* | grep us
filter 0 - 3046 us C - 2827 us OPT
filter 1 - 31028 us C - 27533 us OPT
Original comment by fbarch...@google.com
on 15 Apr 2013 at 9:44
r665 has upsampler and Neon Row filter
chronos@localhost $ sudo LIBYUV_WIDTH=1536 LIBYUV_HEIGHT=929 LIBYUV_REPEAT=1000
nice --5 ./libyuv_unittest --gtest_filter=*ARGBScaleFrom640* | grep us
filter 0 - 2862 us C - 2817 us OPT
filter 1 - 17349 us C - 8713 us OPT
TODO - Neon optimized column filter
Original comment by fbarch...@google.com
on 15 Apr 2013 at 9:50
deferring neon. will need column x86 for yuv as well.
Original comment by fbarch...@google.com
on 19 Jun 2013 at 8:41
Original issue reported on code.google.com by
noah...@google.com
on 23 Mar 2013 at 12:39