koxiong / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Many NEON functions aren't enabled #390

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This issue is specific for ARM v7 only.

Take the function "ScaleRowDown2_NEON" as an example.

NEON code is in file scale_neon.cc and is controlled by the following macros 
#if !defined(LIBYUV_DISABLE_NEON) && defined(__ARM_NEON__) && \
    !defined(__aarch64__)

#endif

Compile flag "-mfpu=neon" is passed to this file, so __ARM_NEON__ is defined 
and NEON code is compiled to libyuv_neon.a.

But the initialization of ScaleRowDown2 for NEON is controlled by macro 
HAS_SCALEROWDOWN2_NEON and placed in file scale.cc
#if defined(HAS_SCALEROWDOWN2_NEON)
  if (TestCpuFlag(kCpuHasNEON) && IS_ALIGNED(dst_width, 16)) {
    ScaleRowDown2 = filtering ? ScaleRowDown2Box_NEON : ScaleRowDown2_NEON;
  }
#endif

HAS_SCALEROWDOWN2_NEON is defined in include/libyuv/scale_row.h as follows:
#if !defined(LIBYUV_DISABLE_NEON) && !defined(__native_client__) && \
    (defined(__ARM_NEON__) || defined(LIBYUV_NEON) || defined(__aarch64__))
#define HAS_SCALEROWDOWN2_NEON
#endif

When compiling the file scale.cc, flag "-mfpu=neon" isn't passed to this file. 
So __ARM_NEON__ is undefined which leads to HAS_SCALEROWDOWN2_NEON undefined. 
That means the initialization code of ScaleRowDown2 for NEON isn't compiled. 
ScaleRowDown2 is still pointed to ScaleRowDown2_C.

This issue exists in many functions, such as all the functions in scale_neon.cc 
and rotate_neon.cc. Part of functions in row_neon.cc also apply this issue, but 
I don't check one by one.

Original issue reported on code.google.com by yang.zh...@arm.com on 9 Jan 2015 at 7:41

GoogleCodeExporter commented 8 years ago
To fix this bug, I have created one CL with minimum change: 
https://webrtc-codereview.appspot.com/32959004/. This patch use LIBYUV_NEON to 
work around undefined __ARM_NEON__. But doing this conceal the fact that 
__ARM_NEON__ is undefined without compile flag "-mfpu=neon". Ignoring it might 
bring other potential issues.

I'm considering some ways as follows:
1. Identify files which has NEON initialization code and pass compile flag 
"-mfpu=neon" to them.
Doing this might be complex, because there many files including NEON 
initialization code. In addition, it destroys current structure of files.

2. Pass compile flag "-mfpu=neon" to all the C file.
It's a little dangerous becasue the toolchain might generate NEON instructions 
automitically. If NEON floating-point instructions which isn't fullly 
compatible with IEEE 754 are generated, the C version is no longer reference.

3. Use LIBYUV_NEON for ARM32 and use LIBYUV_NEON64 for ARM64 
Using them instead of __ARM_NEON__/__aarch64__ could minimize the impact of 
toolchain and compile flag. Based on this way, the macro definition need to be 
added in gyp according to ARM architecture and all the __ARM_NEON__/__aarch64__ 
need to be removed. But the structure of NEON files needn’t to be modified. 

4. Using unified macro such as LIBYUV_NEON for ARM32/ARM64 
Using LIBYUV_NEON only makes the NEON enable clear, but the ARM32 NEON files 
and ARM64 NEON files need to be separated. All the __ARM_NEON__/__aarch64__ 
need to be removed too.

Original comment by yang.zh...@arm.com on 9 Jan 2015 at 9:26

GoogleCodeExporter commented 8 years ago
1 and 2 would potentially crash on cpus without neon
3 seems unnecessary
4 seems right.  it says neon code will be present (e.g. row_neon.cc) but the 
code being compiled (e.g. convert.cc) does not have -ffpu=neon and needs to 
know if the code is there.  row.h will detect the ifdef and enable HAVE_*_NEON 
function macros, which most of the code uses.

For 64 bit I think we can move toward a single library again, requiring neon 
for all code.

For all platforms, the libyuv_neon library should be buildable, but will be 
empty if its not a 32 bit arm build.

Original comment by fbarch...@google.com on 12 Feb 2015 at 12:00

GoogleCodeExporter commented 8 years ago
fixed yes?

Original comment by fbarch...@chromium.org on 16 Mar 2015 at 2:32