Closed clopez closed 1 year ago
Cross-building for ARMv7 (32bits) works fine, only for Aarch64 I'm hitting this issue
Hi, thanks for reporting the issue. This looks similar to #1460 which had previously been reported. There, the compiler was configured with a specific -march, but I don't see that in your config. Are you perhaps specifying a -march or -mcpu via CXXFLAGS? Does the workaround in #1460 help?
Yes, I have this defined on the build environment
# env | grep march
CPP=aarch64-poky-linux-gcc -E --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security
CXX=aarch64-poky-linux-g++ -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot
CCLD=aarch64-poky-linux-gcc -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot
FC=aarch64-poky-linux-gfortran -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot
CC=aarch64-poky-linux-gcc -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot
So it is setting -march=armv8-a+crc
I have tested it and the following patch fixes the build error for my use case:
diff --git a/hwy/ops/set_macros-inl.h b/hwy/ops/set_macros-inl.h
index f64a6a5..e750509 100644
--- a/hwy/ops/set_macros-inl.h
+++ b/hwy/ops/set_macros-inl.h
@@ -361,7 +361,7 @@
// Do not define HWY_TARGET_STR (no pragma).
#else
#if HWY_COMPILER_GCC_ACTUAL
-#define HWY_TARGET_STR "arch=armv8-a+crypto"
+#define HWY_TARGET_STR "arch=armv8-a+crc"
#else // clang
#define HWY_TARGET_STR "+crypto"
#endif // HWY_COMPILER_*
But I don't understand why it ends in that ifdef.
With arch=armv8-a+crc
it doesn't define AES (the RPi4 doesn't have cryptographic extensions).
The only feature that the compiler sets with arch=armv8-a+crc
is __ARM_FEATURE_CRC32
but it doesn't set the ones related to crypto.
Check:
# echo | aarch64-poky-linux-g++ -march=armv8-a -E - -dM > armv8-a_baseline
# echo | aarch64-poky-linux-g++ -march=armv8-a+crc -E - -dM > armv8-a_crc
# echo | aarch64-poky-linux-g++ -march=armv8-a+crypto -E - -dM > armv8-a_crypto
# diff -u armv8-a_baseline armv8-a_crc
--- armv8-a_baseline 2023-07-17 20:10:28.421690965 +0000
+++ armv8-a_crc 2023-07-17 20:10:35.737481401 +0000
@@ -299,6 +299,7 @@
#define __INT_LEAST32_TYPE__ int
#define __SIZEOF_WCHAR_T__ 4
#define __UINT64_TYPE__ long unsigned int
+#define __ARM_FEATURE_CRC32 1
#define __ARM_NEON 1
#define __FLT128_HAS_QUIET_NAN__ 1
#define __INTMAX_MAX__ 0x7fffffffffffffffL
# diff -u armv8-a_baseline armv8-a_crypto
--- armv8-a_baseline 2023-07-17 20:10:28.421690965 +0000
+++ armv8-a_crypto 2023-07-17 20:10:42.641283591 +0000
@@ -34,6 +34,7 @@
#define __UINT_FAST8_MAX__ 0xff
#define __FLT32_MAX_10_EXP__ 38
#define __INT8_C(c) c
+#define __ARM_FEATURE_AES 1
#define __INT_LEAST8_WIDTH__ 8
#define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL
#define __SHRT_MAX__ 0x7fff
@@ -108,6 +109,7 @@
#define __SIZEOF_LONG_DOUBLE__ 16
#define __FLT64_MAX_10_EXP__ 308
#define __FLT16_MAX_10_EXP__ 4
+#define __ARM_FEATURE_CRYPTO 1
#define __INT_FAST32_MAX__ 0x7fffffffffffffffL
#define __DBL_HAS_INFINITY__ 1
#define __INT64_MAX__ 0x7fffffffffffffffL
@@ -198,6 +200,7 @@
#define __ELF__ 1
#define __GCC_ASM_FLAG_OUTPUTS__ 1
#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1
+#define __ARM_FEATURE_SHA2 1
#define __FLT_RADIX__ 2
#define __INT_LEAST16_TYPE__ short int
#define __ARM_ARCH_PROFILE 65
And it also has neon of course
# grep -i neon armv8-a_crc
#define __ARM_NEON 1
So this CPU is an ARMv8-a but without crypto extensions (only crc ones). But it ends entering here
#if HWY_TARGET == HWY_NEON_WITHOUT_AES
// Do not define HWY_TARGET_STR (no pragma).
here ----> #else
#if HWY_COMPILER_GCC_ACTUAL
#define HWY_TARGET_STR "arch=armv8-a+crypto"
#else // clang
#define HWY_TARGET_STR "+crypto"
#endif // HWY_COMPILER_*
#endif // HWY_TARGET == HWY_NEON_WITHOUT_AES
So it is evaluating false the if HWY_TARGET == HWY_NEON_WITHOUT_AES
condition
Why is that?
As far as I can see HWY_TARGET
gets a value HWY_STATIC_TARGET
and is not a matter of passing -march=armv8-a+crypto
or -march=armv8-a+crc
. It happens the same in both cases
Running the compile command with -E -dD
so GCC
prints debug information on how it evaluates the defines I see this:
# aarch64-poky-linux-g++ -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot -DHWY_STATIC_DEFINE -I/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/build=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/build=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot= -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot= -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot-native= -fvisibility-inlines-hidden -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy.dir/hwy/nanobenchmark.cc.o -MF CMakeFiles/hwy.dir/hwy/nanobenchmark.cc.o.d -E -dD /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git/hwy/ops/set_macros-inl.h|grep HWY_TARGET
#define HWY_TARGET HWY_STATIC_TARGET
#define HWY_TARGETS (HWY_ATTAINABLE_TARGETS & ((HWY_STATIC_TARGET - 1LL) | HWY_STATIC_TARGET))
#undef HWY_TARGET_STR
#define HWY_TARGET_STR_PCLMUL_AES ",pclmul,aes"
#define HWY_TARGET_STR_BMI2_FMA ",bmi,bmi2,fma"
#define HWY_TARGET_STR_F16C ",f16c"
#define HWY_TARGET_STR_SSE2 "sse2"
#define HWY_TARGET_STR_SSSE3 "sse2,ssse3"
#define HWY_TARGET_STR_SSE4 HWY_TARGET_STR_SSSE3 ",sse4.1,sse4.2" HWY_TARGET_STR_PCLMUL_AES
#define HWY_TARGET_STR_AVX2 HWY_TARGET_STR_SSE4 ",avx,avx2" HWY_TARGET_STR_BMI2_FMA HWY_TARGET_STR_F16C
#define HWY_TARGET_STR_AVX3 HWY_TARGET_STR_AVX2 ",avx512f,avx512cd,avx512vl,avx512dq,avx512bw"
#define HWY_TARGET_STR_AVX3_DL HWY_TARGET_STR_AVX3 ",vpclmulqdq,avx512vbmi,avx512vbmi2,vaes,avx512vnni,avx512bitalg," "avx512vpopcntdq,gfni"
#define HWY_TARGET_STR_AVX3_SPR HWY_TARGET_STR_AVX3_DL ",avx512fp16"
#define HWY_TARGET_STR_PPC8_CRYPTO ",crypto"
#define HWY_TARGET_STR_PPC8 "altivec,vsx,power8-vector" HWY_TARGET_STR_PPC8_CRYPTO
#define HWY_TARGET_STR_PPC9 HWY_TARGET_STR_PPC8 ",power9-vector"
#define HWY_TARGET_STR_PPC10 HWY_TARGET_STR_PPC9 ",cpu=power10"
mmm, is more complex than that.. evaluating a cc file that includes hwy/foreach_target.h
like hwy/contrib/sort/vqsort_128a.cc
I can see how it assigns different values to HWY_TARGET
and it ends with a value of HWY_STATIC_TARGET
# aarch64-poky-linux-g++ -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot -DHWY_STATIC_DEFINE -I/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/build=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/build=/usr/src/debug/highway/1.0.4.99.git20230717-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot= -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot= -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/recipe-sysroot-native= -fvisibility-inlines-hidden -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -fmath-errno -fno-exceptions -MD -MT CMakeFiles/hwy.dir/hwy/nanobenchmark.cc.o -MF CMakeFiles/hwy.dir/hwy/nanobenchmark.cc.o.d -E -dD /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/highway/1.0.4.99.git20230717-r0/git/hwy/contrib/sort/vqsort_128a.cc|grep "define HWY_TARGET "
#define HWY_TARGET HWY_STATIC_TARGET
#define HWY_TARGET HWY_NEON
#define HWY_TARGET HWY_SVE
#define HWY_TARGET HWY_SVE2
#define HWY_TARGET HWY_SVE_256
#define HWY_TARGET HWY_SVE2_128
#define HWY_TARGET HWY_STATIC_TARGET
You have a really complex system for setting this compiler directives, and not easy to debug. I don't understand what is going on.
Ok.. forget what I said above.
Does the workaround in #1460 help?
I have read again with more attention the comments there, and the workaround is not to change the value of HWY_TARGET_STR but to enable static dispatch.
That works in theory and it looks like the right solution when using Yocto because with Yocto you only target a very specific CPU. You don't build for a set of CPUs but only for a very specific machine.
But in practice then I have later issues when building libjxl (that is why I'm trying to use highway, just to use libxjl).
Seems libjxl calls directly HWY_DYNAMIC_DISPATCH
in several parts of the code.
So if I enable a build with static dispatch via CXXFLAGS/CFLAGS => -DHWY_COMPILE_ONLY_STATIC
then I get later this build error on libjxl
| FAILED: lib/CMakeFiles/jxl_dec-obj.dir/jxl/modular/transform/squeeze.cc.o
| /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/recipe-sysroot-native/usr/bin/aarch64-poky-linux/aarch64-poky-linux-g++ -DHWY_DISABLED_TARGETS="(HWY_SVE|HWY_SVE2|HWY_SVE_256|HWY_SVE2_128|HWY_RVV)" -DJPEGXL_MAJOR_VERSION=0 -DJPEGXL_MINOR_VERSION=8 -DJPEGXL_PATCH_VERSION=1 -DJXL_INTERNAL_LIBRARY_BUILD -D__DATE__=\"redacted\" -D__TIMESTAMP__=\"redacted\" -D__TIME__=\"redacted\" -I/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git -I/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/include -I/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/build/lib/include -mcpu=cortex-a72 -march=armv8-a+crc -fstack-protector-strong -O2 -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git=/usr/src/debug/libjxl/0.8.1-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git=/usr/src/debug/libjxl/0.8.1-r0 -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/build=/usr/src/debug/libjxl/0.8.1-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/build=/usr/src/debug/libjxl/0.8.1-r0 -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/recipe-sysroot= -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/recipe-sysroot= -fdebug-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/recipe-sysroot-native= -fvisibility-inlines-hidden -fno-rtti -funwind-tables -fno-omit-frame-pointer -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -fmacro-prefix-map=/home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git=. -Wno-builtin-macro-redefined -Wall -fmerge-all-constants -fno-builtin-fwrite -fno-builtin-fread -Wextra -Wc++11-compat -Warray-bounds -Wformat-security -Wimplicit-fallthrough -Wno-register -Wno-unused-function -Wno-unused-parameter -Wnon-virtual-dtor -Woverloaded-virtual -Wvla -fsized-deallocation -fno-exceptions -fmath-errno -DJPEGXL_ENABLE_TRANSCODE_JPEG=0 -DJPEGXL_ENABLE_BOXES=1 -std=c++11 -MD -MT lib/CMakeFiles/jxl_dec-obj.dir/jxl/modular/transform/squeeze.cc.o -MF lib/CMakeFiles/jxl_dec-obj.dir/jxl/modular/transform/squeeze.cc.o.d -o lib/CMakeFiles/jxl_dec-obj.dir/jxl/modular/transform/squeeze.cc.o -c /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/jxl/modular/transform/squeeze.cc
| In file included from /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/jxl/modular/transform/squeeze.h:30,
| from /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/jxl/modular/transform/squeeze.cc:6:
| /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/jxl/modular/modular_image.h: In lambda function:
| /home/clopez/webkit/webkit/WebKitBuild/CrossToolChains/rpi4-64bits-mesa/build/tmp/work/cortexa72-poky-linux/libjxl/0.8.1-r0/git/lib/jxl/modular/modular_image.h:70:26: error: inlining failed in call to 'always_inline' 'jxl::pixel_type* jxl::Channel::Row(size_t)': target specific option mismatch
| 70 | JXL_INLINE pixel_type* Row(const size_t y) { return plane.Row(y); }
| | ^~~
Hi @clopez, thanks for looking into this further. Yes, HWY_TARGET is set multiple times: JPEG XL compiles the code once per target instruction set, and the binary then contains code for all of them. This 'dynamic dispatch' model is in contrast to your "I always want to build with just +crc" (static dispatch).
It is actually OK to still use HWY_DYNAMIC_DISPATCH, this will just involve an extra function call (no problem). Static dispatch really just means limiting the set of HWY_TARGET to a single option.
To get that, it is important to specify the -DHWY_COMPILE_ONLY_STATIC
both when compiling Highway as well as JPEG XL. Is it possible that it is only being set for Highway?
To get that, it is important to specify the
-DHWY_COMPILE_ONLY_STATIC
both when compiling Highway as well as JPEG XL. Is it possible that it is only being set for Highway?
I'm building highway as a shared library on one hand, and then building libjxl with -DJPEGXL_FORCE_SYSTEM_HWY=ON
on the other hand, so the idea is that it links against that shared highway library that I built previously.
So: yes, I'm not passing -DHWY_COMPILE_ONLY_STATIC
to the libjxl build, However, I have been now grepping the whole source code of libjxl for HWY_COMPILE_ONLY_STATIC
strings and I don't see it defined anywhere. I don't see how defining that for the libjxl build is going to make any difference assuming that I'm not linking statically with the bundled highway that libjxl uses, but instead I'm trying to dynamically link against a previously built highway shared library.
Got it. FYI the Highway shared library has very little in it. Most happens in headers, and JPEG XL includes those Highway headers. Adding -DHWY_COMPILE_ONLY_STATIC to the JPEG XL build changes their behavior in the desired way :)
I see. Thanks for the info and the documentation.
I found another solution to this issue that allows to build with dynamic dispatch enabled. See: https://github.com/google/highway/pull/1589
I tried both with last stable release 1.0.4 (46e365d6770f5d7a4240d8ac9d8e928a520478ea) and with master as of today (7233df1b4e29a04ecfd3a10a1da14c802f08c3fd)
And in both cases I get this build error:
Compiler identification is
With previous version of Yocto (4.1 aka Langdale with GCC 12.2.0) it was working fine.
The yocto recipe is here: https://github.com/Igalia/meta-webkit/blob/main/recipes-extended/highway