watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

Broken ScaleFilterRows_SSE2 (includes a patch fixing this issue) #177

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Implementation of ScaleFilterRows_SSE2 is broken - produces garbled image if 
enabled.

I've attached a patch that fixes it.

Note: MSVC++ wasn't tested, only posix version was.

--
Best regards,
Krzysztof Kaspruk

Original issue reported on code.google.com by Krzysztof.Kaspruk on 16 Jan 2013 at 2:56

Attachments:

GoogleCodeExporter commented 9 years ago
One wish: if my patch is good enough to be included in source tree, please do 
not write my full and complete email address in clear text anywhere (for 
example in commit log message), as I've just seen for example that: 
http://code.google.com/p/libyuv/source/detail?r=539

I don't want to see my e-mail address spammed.

Thank you.
--
Best regards,
Krzysztof Kaspruk

Original comment by Krzysztof.Kaspruk on 16 Jan 2013 at 10:48

GoogleCodeExporter commented 9 years ago
Thanks for the submission!
I'll need to review this closely and/or improve the unittest.
Is it possible to do without adding the shifts in the inner loop?
+    psllw      xmm2, 0x1
+    psllw      xmm3, 0x1

Original comment by fbarch...@google.com on 17 Jan 2013 at 11:51

GoogleCodeExporter commented 9 years ago
Unfortunately, I've been unable to make it working without these shifts in the 
inner loop.
--
Best regards,
Krzysztof Kaspruk

Original comment by Krzysztof.Kaspruk on 17 Jan 2013 at 1:19

GoogleCodeExporter commented 9 years ago
Okay, took a look and it should work, but I have some thoughts on changes

Instead of zeros in low bits of eax, its more accurate to repeat the bits.  So 
make it 16 bits and then shift right 1.
Prefer use SSE2 for that and leave eax alone.

Instead of psllw #1, use paddw xmm,xmm

Instead of:
punpcklwd  xmm5, xmm5 
punpckldq  xmm5, xmm5 
punpcklqdq xmm5, xmm5 
use:
punpcklwd  xmm5, xmm5        
pshufd     xmm5, xmm5, 0     

Code review is 
https://webrtc-codereview.appspot.com/1061004/

Original comment by fbarch...@google.com on 22 Jan 2013 at 5:20

GoogleCodeExporter commented 9 years ago
Same code applied to ARGBInterpolate gives better performance estimate:

SSSE3
d:\src\libyuv\trunk>runyuv *Inter*Opt | grep ms
[       OK ] libyuvTest.ARGBInterpolate0_Opt (427 ms)
[       OK ] libyuvTest.ARGBInterpolate64_Opt (569 ms)
[       OK ] libyuvTest.ARGBInterpolate128_Opt (558 ms)
[       OK ] libyuvTest.ARGBInterpolate192_Opt (551 ms)
[       OK ] libyuvTest.ARGBInterpolate255_Opt (616 ms)
[       OK ] libyuvTest.ARGBInterpolate85_Opt (618 ms)
[----------] 6 tests from libyuvTest (3339 ms total)

SSE2
d:\src\libyuv\trunk>set LIBYUV_DISABLE_SSSE3=1
d:\src\libyuv\trunk>runyuv *Inter*Opt | grep ms
[       OK ] libyuvTest.ARGBInterpolate0_Opt (421 ms)
[       OK ] libyuvTest.ARGBInterpolate64_Opt (542 ms)
[       OK ] libyuvTest.ARGBInterpolate128_Opt (528 ms)
[       OK ] libyuvTest.ARGBInterpolate192_Opt (545 ms)
[       OK ] libyuvTest.ARGBInterpolate255_Opt (690 ms)
[       OK ] libyuvTest.ARGBInterpolate85_Opt (689 ms)
[----------] 6 tests from libyuvTest (3415 ms total)

Original comment by fbarch...@google.com on 22 Jan 2013 at 7:41

GoogleCodeExporter commented 9 years ago
r548 fixes bilinear on SSE2.

Original comment by fbarch...@google.com on 22 Jan 2013 at 7:52

GoogleCodeExporter commented 9 years ago
This macro in scale.cc
#define HAS_SCALEROWDOWN34_SSE2_DISABLED
causes a link error.  Need to re-enable it and/or do ifdefs differently.
to be fixed in r549.

Next steps are
1. Port to ARGBInterpolate - same code but no last pixel duplication.
2. Do 1/4 and 3/4 specializations.
3. Port to ARGBScale
Consider Any version for odd widths.  Consider making API same as alpha blend.

Original comment by fbarch...@google.com on 22 Jan 2013 at 6:48

GoogleCodeExporter commented 9 years ago
r550 ports ARGBInterpolate to SSE2, and specializes 25% and 75% blends the same 
as SSSE3 did.

Original comment by fbarch...@google.com on 23 Jan 2013 at 2:15

GoogleCodeExporter commented 9 years ago
r559 ports ARGBInterpolate SSE2 to gcc with specializations.
Remaining work
1. ARGB scale
2. specialization for 1/4 on gcc YUV scale
3. investigate pmulhuw to avoid shift.

Original comment by fbarch...@chromium.org on 5 Feb 2013 at 11:08

GoogleCodeExporter commented 9 years ago
Fix for ARGB and specialization is up for review 
https://webrtc-codereview.appspot.com/1115008/

Original comment by fbarch...@google.com on 27 Feb 2013 at 5:17

GoogleCodeExporter commented 9 years ago
r585 fixes SSE2 scaler.
Investigating pmulhuw opened as new bug.  
https://code.google.com/p/libyuv/issues/detail?id=182

Original comment by fbarch...@chromium.org on 27 Feb 2013 at 6:41

GoogleCodeExporter commented 9 years ago
../../source/scale.cc: In function `void libyuv::ScaleFilterRows_SSE2(uint8*, 
const uint8*, ptrdiff_t, int, int)':
../../source/scale.cc:1955:3: error: expected string-literal before `asm'

Original comment by fbarch...@google.com on 28 Feb 2013 at 4:46

GoogleCodeExporter commented 9 years ago
fixed in r587

[----------] Global test environment tear-down
[==========] 595 tests from 1 test case ran. (378 ms total)
[  PASSED  ] 595 tests.
$ export LIBYUV_DISABLE_SSSE3=1
$ runyuv
[==========] 595 tests from 1 test case ran. (403 ms total)
[  PASSED  ] 595 tests.

Original comment by fbarch...@google.com on 28 Feb 2013 at 4:52