google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.86k stars 367 forks source link

armeabi-v7a assembler error #4348

Open RobertFlatt opened 1 year ago

RobertFlatt commented 1 year ago

While building tensorflow lite (v2.11.0) for Android armeabi-v7a I get the following invalid assembly code error:

tflite-runtime/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/xnnpack/src/xnnpack/math.h:311:13: error: invalid output constraint \'=t\' in asm\n      : [i] "=t" (i)

The XNNPACK master version shows the issue is here https://github.com/google/XNNPACK/blob/master/src/xnnpack/math.h#L316

Android arm64-v8a builds and runs without error.. With an earlier tensorflow lite version (v2.8.0) both architectures built and ran without error.

As I read it =t is documented as a valid constraint for "ARM family", but the assembler thinks this is not the case. https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Maratyszcza commented 1 year ago

Which Android NDK version do you use?

RobertFlatt commented 1 year ago

Thanks for the response, NDK is 25b

Maratyszcza commented 1 year ago

This looks like a Clang bug. Note that it only happens when building source files for ARMv6. It looks like Clang assembly validator assumes that VFP registers are not supported (even though XNNPack pass -mfpu=vfp) and thinks that "t" constraint can't be used.

RobertFlatt commented 1 year ago

Thanks, for investigating. And also occurs with v7.

Any suggestions for a workaround? (I can patch, but ARM asm is outside my experience base)

For example, brute force might be to add && !defined(__clang__) to that first case test, is there are better way? Or does it make sense to replace with an =r constraint? (I have no idea of the implications of that)

I notice the code has other tests for Clang issues, is this case something that might be changed in XNNPack?

RobertFlatt commented 1 year ago

On further investigation I think this is an XNNPACK issue, not a Clang issue as suggested.

This post refers to an arm7-a issue. The response describes the build's required use of the -mfpu=vfp flag for arm6.

But this flag is only set for arm6 devices: https://github.com/google/XNNPACK/blob/master/CMakeLists.txt#L548-L553

It looks like Clang assembly validator assumes that VFP registers are not supported (even though XNNPack pass -mfpu=vfp)

XNNPack does not pass -mfpu=vfp to Clang for arm7-a devices.

Maratyszcza commented 1 year ago

XNNPack passes -march=armv6 -mfpu=vfp for the file that fails to compile. So it is building for ARMv6 with VFPv2.

RobertFlatt commented 1 year ago

OK, but that is not the issue.

This issue is about armv7-a not armv6

Maratyszcza commented 1 year ago

ARMv7 build for XNNPack really means 32-bit ARM architecture. The minimal requirements for 32-bit ARM architecture in XNNPack is ARMv6 with VFPv2. Thus, XNNPack includes ARMv6+VFPv2-optimized microkernels in ARMv7 build, even though they won't be used in production.

RobertFlatt commented 1 year ago

OK, but this is still not the issue.

The issue is that armv7 does not build. Please focus on, and address, the assembly code error when building for armv7.

I suggest the issue may be the -mfpu=vfp flag is not set when -march=armv7-a As shown here in XNNPack's Cmake script: https://github.com/google/XNNPACK/blob/master/CMakeLists.txt#L548-L553

leleliu008 commented 1 year ago

I just run following code, this error has gone. but new issue comes https://github.com/google/XNNPACK/issues/4775

sed -i 's|-march=armv6 -mfpu=vfp|-march=armv7-a -mfpu=neon|' CMakeLists.txt
zihaomu commented 7 months ago

Hi @leleliu008, I got a workaround. You can directly modifiy the xnnpack source code which is inside the tflite build folder. image

The main idea is to bypass the error code and use the default branch. It works for me.