katepanping / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

unaligned memory #365

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Consider making low levels use unaligned memory access.

Original issue reported on code.google.com by fbarch...@google.com on 30 Sep 2014 at 5:43

GoogleCodeExporter commented 9 years ago
r1002 has scale, compare and rotate with unaligned support for intel.
mips requires alignment still.

Original comment by fbarch...@google.com on 2 Oct 2014 at 5:56

GoogleCodeExporter commented 9 years ago
The following appear to be alignment checks that should be checked
d:\src\libyuv\trunk\source>findstr -i aligned.*stride.*16 *
format_conversion.cc:      IS_ALIGNED(src_argb, 16) && 
IS_ALIGNED(src_stride_argb, 16)) {
rotate_argb.cc:      IS_ALIGNED(dst, 16) && IS_ALIGNED(dst_stride, 16)) {
rotate_argb.cc:      IS_ALIGNED(src, 16) && IS_ALIGNED(src_stride, 16) &&
rotate_argb.cc:      IS_ALIGNED(dst, 16) && IS_ALIGNED(dst_stride, 16)) {
rotate_argb.cc:      IS_ALIGNED(src, 16) && IS_ALIGNED(src_stride, 16) &&
rotate_argb.cc:      IS_ALIGNED(dst, 16) && IS_ALIGNED(dst_stride, 16)) {
scale.cc:      IS_ALIGNED(dst_width, 8) && IS_ALIGNED(row_stride, 16) &&
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:      IS_ALIGNED(dst_width, 8) && IS_ALIGNED(row_stride, 16) &&
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:      IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16) &&
scale.cc:        IS_ALIGNED(dst_ptr, 16) && IS_ALIGNED(dst_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16) &&
scale.cc:        IS_ALIGNED(dst_ptr, 16) && IS_ALIGNED(dst_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16) &&
scale.cc:        IS_ALIGNED(dst_ptr, 16) && IS_ALIGNED(dst_stride, 16)) {
scale.cc:        IS_ALIGNED(src_ptr, 16) && IS_ALIGNED(src_stride, 16) &&
scale.cc:        IS_ALIGNED(dst_ptr, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:      IS_ALIGNED(src_argb, 16) && IS_ALIGNED(row_stride, 16) &&
scale_argb.cc:      IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:      IS_ALIGNED(src_argb, 16) && IS_ALIGNED(row_stride, 16) &&
scale_argb.cc:      IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:      IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:        IS_ALIGNED(src_argb, 16) && IS_ALIGNED(src_stride, 16) &&
scale_argb.cc:        IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:        IS_ALIGNED(src_argb, 16) && IS_ALIGNED(src_stride, 16) &&
scale_argb.cc:        IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {
scale_argb.cc:        IS_ALIGNED(src_argb, 16) && IS_ALIGNED(src_stride, 16) &&
scale_argb.cc:        IS_ALIGNED(dst_argb, 16) && IS_ALIGNED(dst_stride, 16)) {

Original comment by fbarch...@google.com on 6 Oct 2014 at 11:13

GoogleCodeExporter commented 9 years ago
The following fail:

[  PASSED  ] 862 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] libyuvTest.ARGBToI420_Unaligned
[  FAILED  ] libyuvTest.ARGBToJ420_Unaligned
[  FAILED  ] libyuvTest.BGRAToI420_Unaligned
[  FAILED  ] libyuvTest.ABGRToI420_Unaligned
[  FAILED  ] libyuvTest.RGBAToI420_Unaligned
[  FAILED  ] libyuvTest.ARGBToNV12_Unaligned
[  FAILED  ] libyuvTest.ARGBToNV21_Unaligned

 7 FAILED TESTS
due to pavgb with memory on SSSE3.

Original comment by fbarch...@google.com on 7 Oct 2014 at 12:11

GoogleCodeExporter commented 9 years ago
r1115 functions, but uses some C code for ARGBToI420.
r1116 uses movdqu instead of pavgb for memory references.

Original comment by fbarch...@google.com on 7 Oct 2014 at 8:07

GoogleCodeExporter commented 9 years ago
Benchmarks on Sandy Bridge:

Was r1096
linux64 868 tests from libyuvTest (553265 ms total)
osx64 868 tests from libyuvTest (594923 ms total)
win32 868 tests from libyuvTest (628448 ms total)

r1116
linux64 885 tests from libyuvTest (543007 ms total)
osx64 885 tests from libyuvTest (588508 ms total)
win32 885 tests from libyuvTest (613399 ms total)

Original comment by fbarch...@google.com on 8 Oct 2014 at 12:45

GoogleCodeExporter commented 9 years ago
Example where old code would use C for unaligned:

Was
ARGBToARGB4444_Unaligned (1400 ms)
ARGBToARGB4444_Any (386 ms)
ARGBToARGB4444_Invert (317 ms)
ARGBToARGB4444_Opt (310 ms)
ARGBToARGB4444_Random (261 ms)
1 test case ran. (2674 ms total)

Now
ARGBToARGB4444_Unaligned (426 ms)
ARGBToARGB4444_Any (380 ms)
ARGBToARGB4444_Invert (332 ms)
ARGBToARGB4444_Opt (318 ms)
ARGBToARGB4444_Random (268 ms)
1 test case ran. (1724 ms total)

Original comment by fbarch...@google.com on 9 Oct 2014 at 2:23

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@google.com on 14 Oct 2014 at 12:26