google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.88k stars 375 forks source link

Failed to compile XNNPACK on WoA(Windows on ARM) device. #6558

Open zhanweiw opened 5 months ago

zhanweiw commented 5 months ago

It seems part of the code haven't been compiled. Any idea on how to fix it? Thanks in advance!

FAILED: subgraph-size-test.exe
C:\windows\system32\cmd.exe /C "cd . && C:\Programs\Python\Python311-arm64\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\subgraph-size-test.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64\mt.exe --manifests  -- C:\Programs\LLVM\bin\lld-link.exe /nologo CMakeFiles\subgraph-size-test.dir\test\subgraph-size.c.obj  /out:subgraph-size-test.exe /implib:subgraph-size-test.lib /pdb:subgraph-size-test.pdb /version:0.0 /machine:ARM64 /debug /INCREMENTAL /subsystem:console  XNNPACK.lib  cpuinfo\cpuinfo.lib  pthreadpool\pthreadpool.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK Pass 1: command "C:\Programs\LLVM\bin\lld-link.exe /nologo CMakeFiles\subgraph-size-test.dir\test\subgraph-size.c.obj /out:subgraph-size-test.exe /implib:subgraph-size-test.lib /pdb:subgraph-size-test.pdb /version:0.0 /machine:ARM64 /debug /INCREMENTAL /subsystem:console XNNPACK.lib cpuinfo\cpuinfo.lib pthreadpool\pthreadpool.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:CMakeFiles\subgraph-size-test.dir/intermediate.manifest CMakeFiles\subgraph-size-test.dir/manifest.res" failed (exit code 1) with the following output:
lld-link: error: undefined symbol: xnn_f16_vabs_ukernel__neonfp16arith_u16
>>> referenced by C:\zhanweiw\tf_lite\XNNPACK\src\configs\unary-elementwise-config.c:181
>>>               XNNPACK.lib(unary-elementwise-config.c.obj):(init_f16_abs_config)
>>> referenced by C:\zhanweiw\tf_lite\XNNPACK\src\configs\unary-elementwise-config.c:181
>>>               XNNPACK.lib(unary-elementwise-config.c.obj):(init_f16_abs_config)
fbarchard commented 4 months ago

Hi thanks for the report.

When I give a quick try with blaze which is like bazel, I'm able to build the abs bench blaze build --config=lexan_x86_64 -c opt //third_party/XNNPACK/bench:abs_bench

The microkernel is checked in, declared and used: grep xnn_f16_vabs_ukernelneonfp16arith_u16 . -r ./src/amalgam/gen/neonfp16arith.c:void xnn_f16_vabs_ukernelneonfp16arith_u16( ./src/configs/unary-elementwise-config.c: f16_abs_config.ukernel = (xnn_vunary_ukernel_fn) xnn_f16_vabs_ukernel__neonfp16arith_u16; ./src/configs/unary-elementwise-config.c: f16_abs_config.ukernel = (xnn_vunary_ukernel_fn) xnn_f16_vabs_ukernelneonfp16arith_u16; ./src/xnnpack/vunary.h:DECLARE_F16_VABS_UKERNEL_FUNCTION(xnn_f16_vabs_ukernelneonfp16arith_u16) ./src/f16-vunary/gen/f16-vabs-neonfp16arith-u16.c:void xnn_f16_vabs_ukernelneonfp16arith_u16( ./bench/f16-vabs.cc: xnn_f16_vabs_ukernelneonfp16arith_u16, ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.yaml:- name: xnn_f16_vabs_ukernelneonfp16arith_u16

The important one for linking is the kernel is in ./src/amalgam/gen/neonfp16arith.c which gets built and linked on arm systems unless fp16 is disabled.

For CMakeList.txt around line 509 IF(XNNPACK_ENABLE_ARM_FP16_VECTOR) LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_NEONFP16ARITH_MICROKERNEL_SRCS}) LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_NEONFP16ARITH_AARCH64_MICROKERNEL_SRCS}) ENDIF() appends the fp16 microkernels

Its possible our cmake is missing something for Windows. in scripts/build-windows-arm64.cmd There are CMake parameters for a VS2017 build:

mkdir build\windows
mkdir build\windows\arm64

set CMAKE_ARGS=-DXNNPACK_LIBRARY_TYPE=static -DXNNPACK_ENABLE_ASSEMBLY=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_BF16=OFF
set CMAKE_ARGS=%CMAKE_ARGS% -G="Visual Studio 17 2022" -A=ARM64

rem Use-specified CMake arguments go last to allow overridding defaults
set CMAKE_ARGS=%CMAKE_ARGS% %*

echo %CMAKE_ARGS%

cd build\windows\arm64 && cmake ..\..\.. %CMAKE_ARGS%
cmake --build . -j %NUMBER_OF_PROCESSORS% --config Release
zhanweiw commented 4 months ago

Thanks for your supporting! I've tried to disable the 'XNNPACK_ENABLE_ASSEMBLY' and it works. But if we disabled this feature, it will impact the performance, right? Is that possible to enable 'XNNPACK_ENABLE_ASSEMBLY' on ARM64 windows?

fbarchard commented 4 months ago

The arm assembly is in .S files meant to be compiled with gcc or clang. As far as I know theres no way to assemble them with Visual Studio.

The best solution is compiling with clang or clangcl

MatthewARM commented 4 months ago

I tried with ClangCL from Visual Studio 2022 using: cmake -T"ClangCL"

But I got the same error as https://github.com/llvm/llvm-project/issues/52964

The version of clang is:

clang --version
clang version 17.0.3
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\Llvm\ARM64\bin

So either we need to wait for the intrinsic support to get into a released toolchain, or we need to find a workaround.

fbarchard commented 3 months ago

looking at this function in particular, its not actually using fp16 arithmetics xnn_f16_vabs_ukernel__neonfp16arith_u16

the type is f16, but the implementation is actually neon. the file name is neon, and inconsistent with the kernel name. its not clear if that explains your link error, but the isa should be consistent, because that determines which library/amalgam it goes in.