tametika / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

ARGBToJ422_SSSE3 #546

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
ARGBToUVJ422Row_C is not ported to SSSE3 etc

options:
adapt ARGBToUVI422Row_SSSE3
write wrapper that calls adapt ARGBToUVJ420Row_SSSE3 with stride 0
remove function

needs unittest

ARGBToI422_Unaligned (686 ms)
ARGBToI422_Opt (620 ms)
ARGBToI422_Any (617 ms)
ARGBToI422_Invert (521 ms)
ARGBToI420_Any (463 ms)
ARGBToI420_Unaligned (459 ms)
ARGBToI420_Invert (408 ms)
ARGBToI420_Opt (407 ms)

ARGBToJ422_Unaligned (2439 ms)
ARGBToJ422_Any (2409 ms)
ARGBToJ422_Opt (2407 ms)
ARGBToJ422_Invert (2333 ms)
ARGBToJ420_Any (480 ms)
ARGBToJ420_Unaligned (479 ms)
ARGBToJ420_Opt (358 ms)
ARGBToJ420_Invert (356 ms)

Original issue reported on code.google.com by fbarch...@google.com on 12 Jan 2016 at 2:49

GoogleCodeExporter commented 8 years ago
Switching the ARGBToI422 and ARGBToJ422 to use ARGBToUVRow_AVX2 provides 
improved performance, due to AVX2, with less code.

Original comment by fbarch...@google.com on 13 Jan 2016 at 12:32

GoogleCodeExporter commented 8 years ago
The following revision refers to this bug:
  https://chromium.googlesource.com/libyuv/libyuv.git/+/081475b3c86a049c318cb8182e0b12712ff2b40a

commit 081475b3c86a049c318cb8182e0b12712ff2b40a
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Jan 13 01:05:49 2016

refactor ARGBToI422 using ARGBToI420 internally

R=harryjin@google.com
BUG=libyuv:546

Review URL: https://codereview.chromium.org/1574253004 .

[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/README.chromium
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/include/libyuv/row.h
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/include/libyuv/version
.h
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/convert_from_ar
gb.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_any.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_common.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_gcc.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_neon.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_neon64.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/source/row_win.cc
[modify] 
http://crrev.com/081475b3c86a049c318cb8182e0b12712ff2b40a/unit_test/convert_test
.cc

Original comment by bugdroid1@chromium.org on 13 Jan 2016 at 1:06

GoogleCodeExporter commented 8 years ago
Linux x64

Was
ARGBToI422_Opt (345 ms)
ARGBToI420_Opt (255 ms)
ARGBToJ420_Opt (234 ms)
ARGBToYUY2_Opt (501 ms)
ARGBToUYVY_Opt (467 ms)

Now
ARGBToJ422_Opt (385 ms)
ARGBToI422_Opt (320 ms)
ARGBToI420_Opt (236 ms)
ARGBToJ420_Opt (234 ms)
ARGBToYUY2_Opt (468 ms)
ARGBToUYVY_Opt (423 ms)

Consider doing AVX2
Samples: 3K of event 'cycles', Event count (approx.): 3285698000                

 37.76%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
 31.70%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
 13.69%  libyuv_unittest  libyuv_unittest      [.] I422ToUYVYRow_SSE2
 13.46%  libyuv_unittest  libyuv_unittest      [.] I422ToYUY2Row_SSE2

Original comment by fbarch...@google.com on 13 Jan 2016 at 1:24

GoogleCodeExporter commented 8 years ago
fixed in r1565

Original comment by fbarch...@google.com on 13 Jan 2016 at 1:24