katepanping / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

RGB565 avx2 #421

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
RGB565ToARGB is a common conversion for Android; Optimize it for AVX2.

Current speed for SSE2 on Sandy Bridge is 0.3 ms/frame for 720p:

set LIBYUV_WIDTH=1280
set LIBYUV_HEIGHT=720
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\release\libyuv_unittest --gtest_filter=*RGB565ToARGB*

RGB565ToARGB_Unaligned (382 ms)
RGB565ToARGB_Any (361 ms)
RGB565ToARGB_Invert (343 ms)
RGB565ToARGB_Opt (300 ms)
RGB565ToARGB_Random (60 ms)

Original issue reported on code.google.com by fbarch...@chromium.org on 6 Apr 2015 at 6:10

GoogleCodeExporter commented 9 years ago
On Haswell:

Was SSE2
RGB565ToARGB_Any (4781 ms)
RGB565ToARGB_Unaligned (5281 ms)
RGB565ToARGB_Invert (5219 ms)
RGB565ToARGB_Opt (4813 ms)
RGB565ToARGB_Random (2515 ms)

Now AVX2
RGB565ToARGB_Any (4484 ms)
RGB565ToARGB_Unaligned (4750 ms)
RGB565ToARGB_Invert (5501 ms)
RGB565ToARGB_Opt (4515 ms)
RGB565ToARGB_Random (2282 ms)

Original comment by fbarch...@chromium.org on 6 Apr 2015 at 7:32

GoogleCodeExporter commented 9 years ago
r1362 adds missing vzeroupper.

Was SSE2
RGB565ToARGB_Any (4862 ms)
RGB565ToARGB_Unaligned (5297 ms)
RGB565ToARGB_Invert (5235 ms)
RGB565ToARGB_Opt (4812 ms)
RGB565ToARGB_Random (2454 ms)
RGB565ToI420_Any (11781 ms)
RGB565ToI420_Unaligned (10063 ms)
RGB565ToI420_Invert (10062 ms)
RGB565ToI420_Opt (9719 ms)

Now AVX2
RGB565ToARGB_Any (4484 ms)
RGB565ToARGB_Unaligned (4719 ms)
RGB565ToARGB_Invert (5500 ms)
RGB565ToARGB_Opt (4516 ms)
RGB565ToARGB_Random (2333 ms)
RGB565ToI420_Any (11672 ms)
RGB565ToI420_Unaligned (7875 ms)
RGB565ToI420_Invert (7563 ms)
RGB565ToI420_Opt (7422 ms)

Original comment by fbarch...@chromium.org on 7 Apr 2015 at 10:55

GoogleCodeExporter commented 9 years ago
Was SSE2
RGB565ToARGB_Opt (500 ms)
ARGB1555ToARGB_Opt (640 ms)
ARGB4444ToARGB_Opt (468 ms)
RGB565ToI420_Opt (984 ms)
ARGB1555ToI420_Opt (1171 ms)
ARGB4444ToI420_Opt (922 ms)

Now AVX2
RGB565ToARGB_Opt (469 ms)
ARGB1555ToARGB_Opt (469 ms)
ARGB4444ToARGB_Opt (438 ms)
RGB565ToI420_Opt (750 ms)
ARGB1555ToI420_Opt (797 ms)
ARGB4444ToI420_Opt (531 ms)

Original comment by fbarch...@chromium.org on 7 Apr 2015 at 11:53

GoogleCodeExporter commented 9 years ago
fixed in r1363 
All RGB565 functions are ported to AVX2

findstr RGB565.*AVX convert* planar* | wc -l
     24      80    1539

findstr RGB565.*SSE convert* planar* | wc -l
     24      80    1548

Original comment by fbarch...@chromium.org on 7 Apr 2015 at 11:57

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@chromium.org on 7 Apr 2015 at 11:57