google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.81k stars 342 forks source link

Xnnpack still builds with `+dotprod` and `+fp16` with `-DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF` #6165

Open misterBart opened 5 months ago

misterBart commented 5 months ago

I'm building aan Arm64 target with a fairly old toolchain (gcc 7.5, binutils 2.29.1) in order to support old Linux platforms. I use: -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF Yet Xnnpack still seems to build with +dotprod and +fp16:

In file included from /home/personau/LinuxToolchainsTest/tflite_aarch64_release/xnnpack/src/f16-dwconv2d-chw/gen/5x5s2p2-minmax-neonfp16arith-1x4.c:12:0:
/home/personau/x-tools/aarch64-unknown-linux-gnu-glibc2.25-gcc7.5/lib/gcc/aarch64-unknown-linux-gnu/7.5.0/include/arm_neon.h:17259:1: note: expected 'const float16_t * {aka const __fp16 *}' but argument is of type 'const uint16_t * {aka const short unsigned int *
'
 vld1_dup_f16 (const float16_t* __a)
 ^~~~~~~~~~~~
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/XNNPACK.dir/build.make:4093: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/src/f16-gemm/gen-inc/1x8inc-minmax-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6137: _deps/xnnpack-build/CMakeFiles/XNNPACK.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
gmake: *** [Makefile:136: all] Error 2
fbarchard commented 5 months ago

the build system determines which kernels to build. the macros reflect what was enabled and wont test/use the disabled kernels. with bazel there are flags to control each instruction set:

--define=xnn_enable_arm_fp16_vector=false
--define=xnn_enable_arm_dotprod=false

cmake has options, but I'm not familiar with the usage

XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

On Intel I added some gcc version checking to force the flags off, and that could be done for arm gcc with a change to CMakeLists.txt.. it would be something like:


IF(CMAKE_C_COMPILER_ID STREQUAL "GNU")
  IF(CMAKE_C_COMPILER_VERSION VERSION_LESS "11")
    SET(XNNPACK_ENABLE_ARM_FP16_VECTOR OFF)
    SET(XNNPACK_ENABLE_ARM_DOTPROD OFF)
  ENDIF()
ENDIF()```
misterBart commented 5 months ago

cmake has options, but I'm not familiar with the usage

XNNPACK_ENABLE_ARM_FP16_VECTOR
XNNPACK_ENABLE_ARM_DOTPROD

Yes, I already turned these off, see my opening post. The problem is that, even though I set these CMake options to OFF, Xnnpack still builds with +dotprod and +fp16.

alankelly commented 5 months ago

What version of XNNPack are you building? The failing file was removed on Sep 26, 2022

misterBart commented 5 months ago

The version part of TfLite 2.10. (Can I check the specific Xnnpack version in the TfLite source code?) TfLite 2.10.1 was released Nov 16, 2022. Perhaps that TfLite still includes the failing file.

alankelly commented 5 months ago

Can you update to the latest release? We can't fix old releases.

misterBart commented 5 months ago

Still getting the errors with the latest TfLite release (2.16):

cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/build.make:173: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:6832: _deps/xnnpack-build/CMakeFiles/microkernels-prod.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
cc1: error: invalid feature modifier in '-march=armv8.2-a+fp16+dotprod'
gmake[2]: *** [_deps/xnnpack-build/CMakeFiles/microkernels-all.dir/build.make:40157: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/src/f16-gemm/gen/f16-gemm-1x8-minmax-asm-aarch64-neonfp16arith-ld64.S.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:6806: _deps/xnnpack-build/CMakeFiles/microkernels-all.dir/all] Error 2
gmake: *** [Makefile:136: all] Error 2

Steps I execute:

git clone --single-branch --branch r2.16 https://github.com/tensorflow/tensorflow tensorflow_src
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchain_aarch64.cmake -DCMAKE_BUILD_TYPE=release -DXNNPACK_ENABLE_ARM_BF16=OFF -DXNNPACK_ENABLE_ARM_I8MM=OFF -DXNNPACK_ENABLE_ARM_DOTPROD=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_FP16_VECTOR=OFF ../tensorflow_src/tensorflow/lite
cmake --build . -j 8 --config release
alankelly commented 5 months ago

Can you try adding -DXNNPACK_ENABLE_ASSEMBLY=OFF?

misterBart commented 5 months ago

After adding that option TfLite 2.16 builds without errors, and I can run a test program on an Arm64 board using TfLite 2.16. But before I cheer too early, the test program runs slower now, which naturally comes from disabling the use of assembly code. -DXNNPACK_ENABLE_ASSEMBLY=OFF is too profound. The Arm64 board does not support float16, etc. but I would still like to use the other assembly micro-kernels in Xnnpack.

alankelly commented 5 months ago

Ok, we know what the problem is now. The solution is to get the update-microkernels script to split the assembly files into ones with and without arm V8 and to create new targets with the appropriate compilation options. Would you like to send a PR to do this?

misterBart commented 5 months ago

A PR suggests I know what to fix in the codebase, which I don't.