Closed GoogleCodeExporter closed 9 years ago
One wish: if my patch is good enough to be included in source tree, please do
not write my full and complete email address in clear text anywhere (for
example in commit log message), as I've just seen for example that:
http://code.google.com/p/libyuv/source/detail?r=539
I don't want to see my e-mail address spammed.
Thank you.
--
Best regards,
Krzysztof Kaspruk
Original comment by Krzysztof.Kaspruk
on 16 Jan 2013 at 10:48
Thanks for the submission!
I'll need to review this closely and/or improve the unittest.
Is it possible to do without adding the shifts in the inner loop?
+ psllw xmm2, 0x1
+ psllw xmm3, 0x1
Original comment by fbarch...@google.com
on 17 Jan 2013 at 11:51
Unfortunately, I've been unable to make it working without these shifts in the
inner loop.
--
Best regards,
Krzysztof Kaspruk
Original comment by Krzysztof.Kaspruk
on 17 Jan 2013 at 1:19
Okay, took a look and it should work, but I have some thoughts on changes
Instead of zeros in low bits of eax, its more accurate to repeat the bits. So
make it 16 bits and then shift right 1.
Prefer use SSE2 for that and leave eax alone.
Instead of psllw #1, use paddw xmm,xmm
Instead of:
punpcklwd xmm5, xmm5
punpckldq xmm5, xmm5
punpcklqdq xmm5, xmm5
use:
punpcklwd xmm5, xmm5
pshufd xmm5, xmm5, 0
Code review is
https://webrtc-codereview.appspot.com/1061004/
Original comment by fbarch...@google.com
on 22 Jan 2013 at 5:20
Same code applied to ARGBInterpolate gives better performance estimate:
SSSE3
d:\src\libyuv\trunk>runyuv *Inter*Opt | grep ms
[ OK ] libyuvTest.ARGBInterpolate0_Opt (427 ms)
[ OK ] libyuvTest.ARGBInterpolate64_Opt (569 ms)
[ OK ] libyuvTest.ARGBInterpolate128_Opt (558 ms)
[ OK ] libyuvTest.ARGBInterpolate192_Opt (551 ms)
[ OK ] libyuvTest.ARGBInterpolate255_Opt (616 ms)
[ OK ] libyuvTest.ARGBInterpolate85_Opt (618 ms)
[----------] 6 tests from libyuvTest (3339 ms total)
SSE2
d:\src\libyuv\trunk>set LIBYUV_DISABLE_SSSE3=1
d:\src\libyuv\trunk>runyuv *Inter*Opt | grep ms
[ OK ] libyuvTest.ARGBInterpolate0_Opt (421 ms)
[ OK ] libyuvTest.ARGBInterpolate64_Opt (542 ms)
[ OK ] libyuvTest.ARGBInterpolate128_Opt (528 ms)
[ OK ] libyuvTest.ARGBInterpolate192_Opt (545 ms)
[ OK ] libyuvTest.ARGBInterpolate255_Opt (690 ms)
[ OK ] libyuvTest.ARGBInterpolate85_Opt (689 ms)
[----------] 6 tests from libyuvTest (3415 ms total)
Original comment by fbarch...@google.com
on 22 Jan 2013 at 7:41
r548 fixes bilinear on SSE2.
Original comment by fbarch...@google.com
on 22 Jan 2013 at 7:52
This macro in scale.cc
#define HAS_SCALEROWDOWN34_SSE2_DISABLED
causes a link error. Need to re-enable it and/or do ifdefs differently.
to be fixed in r549.
Next steps are
1. Port to ARGBInterpolate - same code but no last pixel duplication.
2. Do 1/4 and 3/4 specializations.
3. Port to ARGBScale
Consider Any version for odd widths. Consider making API same as alpha blend.
Original comment by fbarch...@google.com
on 22 Jan 2013 at 6:48
r550 ports ARGBInterpolate to SSE2, and specializes 25% and 75% blends the same
as SSSE3 did.
Original comment by fbarch...@google.com
on 23 Jan 2013 at 2:15
r559 ports ARGBInterpolate SSE2 to gcc with specializations.
Remaining work
1. ARGB scale
2. specialization for 1/4 on gcc YUV scale
3. investigate pmulhuw to avoid shift.
Original comment by fbarch...@chromium.org
on 5 Feb 2013 at 11:08
Fix for ARGB and specialization is up for review
https://webrtc-codereview.appspot.com/1115008/
Original comment by fbarch...@google.com
on 27 Feb 2013 at 5:17
r585 fixes SSE2 scaler.
Investigating pmulhuw opened as new bug.
https://code.google.com/p/libyuv/issues/detail?id=182
Original comment by fbarch...@chromium.org
on 27 Feb 2013 at 6:41
../../source/scale.cc: In function `void libyuv::ScaleFilterRows_SSE2(uint8*,
const uint8*, ptrdiff_t, int, int)':
../../source/scale.cc:1955:3: error: expected string-literal before `asm'
Original comment by fbarch...@google.com
on 28 Feb 2013 at 4:46
fixed in r587
[----------] Global test environment tear-down
[==========] 595 tests from 1 test case ran. (378 ms total)
[ PASSED ] 595 tests.
$ export LIBYUV_DISABLE_SSSE3=1
$ runyuv
[==========] 595 tests from 1 test case ran. (403 ms total)
[ PASSED ] 595 tests.
Original comment by fbarch...@google.com
on 28 Feb 2013 at 4:52
Original issue reported on code.google.com by
Krzysztof.Kaspruk
on 16 Jan 2013 at 2:56Attachments: