myrao / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

64 bit ARMv8 support for libyuv #319

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We need to evaluate if libyuv works well and performs in 64 bit devices.

Original issue reported on code.google.com by fbarch...@chromium.org on 21 Mar 2014 at 5:39

GoogleCodeExporter commented 9 years ago
This appears to be how to compile 64 bit for ios

GYP_DEFINES="OS=ios target_arch=armv7 target_subarch=64" GYP_CROSSCOMPILE=1 
GYP_GENERATOR_FLAGS="output_dir=out_ios" ./build/gyp_chromium -f ninja 
--depth=. libyuv_test.gyp 
ninja -j7 -C out_ios/Debug-iphoneos 

first 2 issues are q0 is reserved

../../source/row_neon.cc:136:23: error: unknown register name 'q0' in asm
    : "cc", "memory", "q0", "q1", "q2", "q3",

and general purpose registers are renamed

../../source/rotate_neon.cc:183:23: error: unknown register name 'r9' in asm
    : "memory", "cc", "r9", "q0", "q1", "q2", "q3"
                      ^
../../source/rotate_neon.cc:396:23: error: unknown register name 'r9' in asm
    : "memory", "cc", "r9",

Original comment by fbarch...@chromium.org on 28 Mar 2014 at 10:52

GoogleCodeExporter commented 9 years ago
Fixed in r994
Followup work needed for Neon version, but users are able to do the build now.

Original comment by fbarch...@google.com on 1 Apr 2014 at 5:48

GoogleCodeExporter commented 9 years ago
Updating to new tool chains has introduced some build bot issues
http://build.chromium.org/p/tryserver.libyuv/builders/linux_asan/builds/441/step
s/compile/logs/stdio
http://build.chromium.org/p/tryserver.libyuv/builders/ios_rel/builds/179/steps/c
ompile/logs/stdio
http://build.chromium.org/p/tryserver.libyuv/builders/mac/builds/440/steps/compi
le/logs/stdio

Original comment by fbarch...@chromium.org on 2 Apr 2014 at 7:41

GoogleCodeExporter commented 9 years ago
r998 replaces r9 register to %0 parameter which will map to x9 for arm64

Original comment by fbarch...@chromium.org on 3 Apr 2014 at 6:48

GoogleCodeExporter commented 9 years ago
r1000 fixes 64 bit clang builds

Original comment by fbarch...@chromium.org on 14 Apr 2014 at 4:40

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@chromium.org on 23 May 2014 at 10:06

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@chromium.org on 11 Jun 2014 at 9:06

GoogleCodeExporter commented 9 years ago
Partially fixed - gpr pointers/registers fixed.
neon registers do not overlap like they used to, and this issue will affect 
some functions, but most will be affected in the registers declared, not the 
code.

Original comment by fbarch...@chromium.org on 27 Jun 2014 at 1:01

GoogleCodeExporter commented 9 years ago
Hi, there,
I'm looking at libYUV's ARMv8 Neon optimization enabling. I wonder is there 
something I can do for this? Such as converting the ARMv7 neon optimized to 
ARMv8 neon optimized code for the functions in following files:
compare_neon.cc, rotate_neon.cc row_neon.cc scale_neon.cc

Original comment by zhongwei...@arm.com on 28 Jul 2014 at 3:51

GoogleCodeExporter commented 9 years ago
We're at about 40% complete first pass armv8 conversion.
Source files are *_neon64.cc
An overall second pass should be done to bump all registers to 16 bytes instead 
of 8 byte, which was an armv7 restriction.

Original comment by fbarch...@google.com on 23 Aug 2014 at 1:26

GoogleCodeExporter commented 9 years ago
A metric for completeness is number of _NEON functions in 64 bit vs 32 bit.

For 32 bit:
otool -tV libyuv_neon.*_neon.o  | grep NEON: | wc -l
105

For 64 bit:
otool -tV libyuv_neon.*_neon64.arm64.o | grep NEON: | wc -l
105

Looks like the initial port is complete.
Followup needed for
1. test it actually works
2. compare performance is on par
3. optimize for 64 bit - can do 16 pixels at a time instead of 8.
4. port more functions to neon.  all functions that are optimized for intel 
should have a neon equivalent.
on intel scale has 22 optimized functions.  neon has 15

Original comment by fbarch...@google.com on 14 Oct 2014 at 1:11

GoogleCodeExporter commented 9 years ago
Scale for Intel: 22 functions
objdump -D libyuv.scale_posix.o | grep text.*SSE.*:
ScaleRowDown2_SSE2:
ScaleRowDown2Linear_SSE2:
ScaleRowDown2Box_SSE2:
ScaleRowDown4_SSE2:
ScaleRowDown4Box_SSE2:
ScaleRowDown34_SSSE3:
ScaleRowDown34_1_Box_SSSE3:
ScaleRowDown34_0_Box_SSSE3:
ScaleRowDown38_SSSE3:
ScaleRowDown38_2_Box_SSSE3:
ScaleRowDown38_3_Box_SSSE3:
ScaleAddRows_SSE2:
ScaleFilterCols_SSSE3:
ScaleColsUp2_SSE2:
ScaleARGBRowDown2_SSE2:
ScaleARGBRowDown2Linear_SSE2:
ScaleARGBRowDown2Box_SSE2:
ScaleARGBRowDownEven_SSE2:
ScaleARGBRowDownEvenBox_SSE2:
ScaleARGBCols_SSE2:
ScaleARGBColsUp2_SSE2:
ScaleARGBFilterCols_SSSE3:

On Arm: 15 functions
otool -tV libyuv_neon.scale_neon64.arm64.o | grep NEON: 
_ScaleRowDown2_NEON:
_ScaleRowDown2Box_NEON:
_ScaleRowDown4_NEON:
_ScaleRowDown4Box_NEON:
_ScaleRowDown34_NEON:
_ScaleRowDown34_0_Box_NEON:
_ScaleRowDown34_1_Box_NEON:
_ScaleRowDown38_NEON:
_ScaleRowDown38_3_Box_NEON:
_ScaleRowDown38_2_Box_NEON:
_ScaleFilterRows_NEON:
_ScaleARGBRowDown2_NEON:
_ScaleARGBRowDown2Box_NEON:
_ScaleARGBRowDownEven_NEON:
_ScaleARGBRowDownEvenBox_NEON:

Original comment by fbarch...@google.com on 14 Oct 2014 at 1:15