Open zhanweiw opened 5 months ago
Hi thanks for the report.
When I give a quick try with blaze which is like bazel, I'm able to build the abs bench blaze build --config=lexan_x86_64 -c opt //third_party/XNNPACK/bench:abs_bench
The microkernel is checked in, declared and used: grep xnn_f16_vabs_ukernelneonfp16arith_u16 . -r ./src/amalgam/gen/neonfp16arith.c:void xnn_f16_vabs_ukernelneonfp16arith_u16( ./src/configs/unary-elementwise-config.c: f16_abs_config.ukernel = (xnn_vunary_ukernel_fn) xnn_f16_vabs_ukernel__neonfp16arith_u16; ./src/configs/unary-elementwise-config.c: f16_abs_config.ukernel = (xnn_vunary_ukernel_fn) xnn_f16_vabs_ukernelneonfp16arith_u16; ./src/xnnpack/vunary.h:DECLARE_F16_VABS_UKERNEL_FUNCTION(xnn_f16_vabs_ukernelneonfp16arith_u16) ./src/f16-vunary/gen/f16-vabs-neonfp16arith-u16.c:void xnn_f16_vabs_ukernelneonfp16arith_u16( ./bench/f16-vabs.cc: xnn_f16_vabs_ukernelneonfp16arith_u16, ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.cc: .TestAbs(xnn_f16_vabs_ukernelneonfp16arith_u16); ./test/f16-vabs.yaml:- name: xnn_f16_vabs_ukernelneonfp16arith_u16
The important one for linking is the kernel is in ./src/amalgam/gen/neonfp16arith.c which gets built and linked on arm systems unless fp16 is disabled.
For CMakeList.txt around line 509 IF(XNNPACK_ENABLE_ARM_FP16_VECTOR) LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_NEONFP16ARITH_MICROKERNEL_SRCS}) LIST(APPEND PROD_MICROKERNEL_SRCS ${PROD_NEONFP16ARITH_AARCH64_MICROKERNEL_SRCS}) ENDIF() appends the fp16 microkernels
Its possible our cmake is missing something for Windows. in scripts/build-windows-arm64.cmd There are CMake parameters for a VS2017 build:
mkdir build\windows
mkdir build\windows\arm64
set CMAKE_ARGS=-DXNNPACK_LIBRARY_TYPE=static -DXNNPACK_ENABLE_ASSEMBLY=OFF -DXNNPACK_ENABLE_ARM_FP16_SCALAR=OFF -DXNNPACK_ENABLE_ARM_BF16=OFF
set CMAKE_ARGS=%CMAKE_ARGS% -G="Visual Studio 17 2022" -A=ARM64
rem Use-specified CMake arguments go last to allow overridding defaults
set CMAKE_ARGS=%CMAKE_ARGS% %*
echo %CMAKE_ARGS%
cd build\windows\arm64 && cmake ..\..\.. %CMAKE_ARGS%
cmake --build . -j %NUMBER_OF_PROCESSORS% --config Release
Thanks for your supporting! I've tried to disable the 'XNNPACK_ENABLE_ASSEMBLY' and it works. But if we disabled this feature, it will impact the performance, right? Is that possible to enable 'XNNPACK_ENABLE_ASSEMBLY' on ARM64 windows?
The arm assembly is in .S files meant to be compiled with gcc or clang. As far as I know theres no way to assemble them with Visual Studio.
The best solution is compiling with clang or clangcl
I tried with ClangCL from Visual Studio 2022 using:
cmake -T"ClangCL"
But I got the same error as https://github.com/llvm/llvm-project/issues/52964
The version of clang is:
clang --version
clang version 17.0.3
Target: aarch64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\Llvm\ARM64\bin
So either we need to wait for the intrinsic support to get into a released toolchain, or we need to find a workaround.
looking at this function in particular, its not actually using fp16 arithmetics xnn_f16_vabs_ukernel__neonfp16arith_u16
the type is f16, but the implementation is actually neon. the file name is neon, and inconsistent with the kernel name. its not clear if that explains your link error, but the isa should be consistent, because that determines which library/amalgam it goes in.
It seems part of the code haven't been compiled. Any idea on how to fix it? Thanks in advance!