watery01 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

AVX version for I420ToUYVY #211

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Here proposed an AVX patch of the I420ToUYVY case for the performance 
improvement.

set LIBYUV_REPEAT=1000
Was:
[ RUN      ] libyuvTest.I420ToUYVY_Any
[       OK ] libyuvTest.I420ToUYVY_Any (1152 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Unaligned
[       OK ] libyuvTest.I420ToUYVY_Unaligned (1173 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Invert
[       OK ] libyuvTest.I420ToUYVY_Invert (230 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Opt
[       OK ] libyuvTest.I420ToUYVY_Opt (249 ms)

Now:
[ RUN      ] libyuvTest.I420ToUYVY_Any
[       OK ] libyuvTest.I420ToUYVY_Any (245 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Unaligned
[       OK ] libyuvTest.I420ToUYVY_Unaligned (233 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Invert
[       OK ] libyuvTest.I420ToUYVY_Invert (228 ms)
[ RUN      ] libyuvTest.I420ToUYVY_Opt
[       OK ] libyuvTest.I420ToUYVY_Opt (232 ms)

Original issue reported on code.google.com by changjun...@intel.com on 29 Mar 2013 at 3:42

Attachments:

GoogleCodeExporter commented 9 years ago
Patch updated.
Fixed minor style issues and added the YUY2 case.

Original comment by changjun...@intel.com on 1 Apr 2013 at 6:52

Attachments:

GoogleCodeExporter commented 9 years ago
I counter propose SSE2 switch from aligned to unaligned
https://webrtc-codereview.appspot.com/1274005

Pros
Worst case is better for apps that dont align memory.
Less code than aligned (SSE2) and unaligned (AVX).
Cons
Atom and Core2 performance is worse

Original comment by fbarch...@google.com on 2 Apr 2013 at 10:24

GoogleCodeExporter commented 9 years ago
r634 changes I420ToUYVY_SSE2 to use unaligned movdqu

Linux Core2 Before
[       OK ] libyuvTest.ARGBToUYVY_Any (2163 ms)
[       OK ] libyuvTest.ARGBToUYVY_Unaligned (2078 ms)
[       OK ] libyuvTest.I420ToUYVY_Unaligned (1104 ms)
[       OK ] libyuvTest.I422ToUYVY_Unaligned (1101 ms)
[       OK ] libyuvTest.I420ToUYVY_Any (1101 ms)
[       OK ] libyuvTest.ARGBToUYVY_Invert (1061 ms)
[       OK ] libyuvTest.ARGBToUYVY_Opt (1048 ms)
[       OK ] libyuvTest.I422ToUYVY_Invert (235 ms)
[       OK ] libyuvTest.I422ToUYVY_Any (225 ms)
[       OK ] libyuvTest.I422ToUYVY_Opt (224 ms)
[       OK ] libyuvTest.I420ToUYVY_Invert (212 ms)
[       OK ] libyuvTest.I420ToUYVY_Opt (211 ms)
       OK ] libyuvTest.ARGBToUYVY_Random (30 ms)
-----] 13 tests from libyuvTest (10794 ms total)

Linux Core2 After
[       OK ] libyuvTest.ARGBToUYVY_Unaligned (2252 ms)
[       OK ] libyuvTest.ARGBToUYVY_Any (1567 ms)
[       OK ] libyuvTest.ARGBToUYVY_Invert (1301 ms)
[       OK ] libyuvTest.ARGBToUYVY_Opt (1246 ms)
[       OK ] libyuvTest.I420ToUYVY_Unaligned (478 ms)
[       OK ] libyuvTest.I422ToUYVY_Unaligned (444 ms)
[       OK ] libyuvTest.I420ToUYVY_Any (413 ms)
[       OK ] libyuvTest.I420ToUYVY_Invert (386 ms)
[       OK ] libyuvTest.I422ToUYVY_Invert (384 ms)
[       OK ] libyuvTest.I420ToUYVY_Opt (326 ms)
[       OK ] libyuvTest.I422ToUYVY_Opt (325 ms)
[       OK ] libyuvTest.I422ToUYVY_Any (324 ms)
[       OK ] libyuvTest.ARGBToUYVY_Random (31 ms)
[----------] 13 tests from libyuvTest (9478 ms total)

45% slower for I422ToUYVY_Opt

Original comment by fbarch...@google.com on 2 Apr 2013 at 10:12

GoogleCodeExporter commented 9 years ago
Optimized for unaligned SSE2 instead

Original comment by fbarch...@chromium.org on 4 Apr 2013 at 6:37