google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
3.95k stars 305 forks source link

s390x/Z14: error: inlining failed in call to ‘always_inline’ #2241

Closed malaterre closed 3 weeks ago

malaterre commented 3 weeks ago

I started the first build of hwy on s390x/Z14. The build fails with:

FAILED: CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o /usr/bin/c++ -DHWY_SHARED_DEFINE -DTOOLCHAIN_MISS_ASM_HWCAP_H -I"/<>" -g -O2 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -march=z15 -mzvector -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -DDATE=\"redacted\" -DTIMESTAMP=\"redacted\" -DTIME=\"redacted\" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -DHWY_IS_TEST=1 -DGTEST_HAS_PTHREAD=1 -MD -MT CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o -MF CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o.d -o CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o -c '/<>/hwy/contrib/algo/copy_test.cc' In file included from /<>/hwy/aligned_allocator.h:32, from /<>/hwy/contrib/algo/copy_test.cc:18: /<>/hwy/base.h: In function ‘hwy::N_Z14::Load<hwy::N_Z14::Simd<unsigned short, 1ul, 0>, (void)0, unsigned short>(hwy::N_Z14::Simd<unsigned short, 1ul, 0>, unsigned short const)decltype (Zero((hwy::N_Z14::Simd<unsigned short, 1ul, 0>)()))’: /<>/hwy/base.h:336:14: error: inlining failed in call to ‘always_inline’ ‘hwy::CopyBytes<2ul, unsigned short, unsigned short>(unsigned short const, unsigned short)void’: target specific option mismatch 336 | HWY_API void CopyBytes(const From HWY_RESTRICT from, To HWY_RESTRICT to) { | ^~~~~ In file included from /<>/hwy/highway.h:586, from /<>/hwy/contrib/algo/copy_test.cc:24, from /<>/hwy/foreach_target.h:290, from /<>/hwy/contrib/algo/copy_test.cc:23: /<>/hwy/ops/ppc_vsx-inl.h:697:26: note: called from here 697 | CopyBytes<d.MaxBytes()>(p, &bits); | ~~~~~^~~~

ref:

malaterre commented 3 weeks ago

For reference [zandonai](https://buildd.debian.org/status/architecture.php?a=s390x&suite=sid&buildd=buildd-zandonai) reports that Z14 is supported:

[113/184] : && /usr/bin/c++ -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -Wdate-time -D_FORTIFY_SOURCE=2 -flto=auto -ffat-lto-objects -Wl,-z,relro -Wl,-z,now     -fPIE -pie CMakeFiles/hwy_list_targets.dir/hwy/tests/list_targets.cc.o -o hwy_list_targets  -Wl,-rpath,/<<PKGBUILDDIR>>/obj-s390x-linux-gnu  libhwy.so.1.2.0 && cd /<<PKGBUILDDIR>>/obj-s390x-linux-gnu && /<<PKGBUILDDIR>>/obj-s390x-linux-gnu/hwy_list_targets || ( exit 0 )
Config: emu128:0 scalar:0 static:0 all_attain:0 is_test:0
Compiled HWY_TARGETS:   EMU128
HWY_ATTAINABLE_TARGETS: EMU128
HWY_BASELINE_TARGETS:   EMU128
HWY_STATIC_TARGET:      EMU128
HWY_BROKEN_TARGETS:    
HWY_DISABLED_TARGETS:  
Current CPU supports:   Z15 Z14 EMU128 SCALAR
johnplatts commented 3 weeks ago

I started the first build of hwy on s390x/Z14. The build fails with:

FAILED: CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o /usr/bin/c++ -DHWY_SHARED_DEFINE -DTOOLCHAIN_MISS_ASM_HWCAP_H -I"/<>" -g -O2 -ffile-prefix-map=/<>=. -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -DHWY_BROKEN_EMU128=0 -march=z15 -mzvector -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -std=c++17 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -Wno-builtin-macro-redefined -DDATE="redacted" -DTIMESTAMP="redacted" -DTIME="redacted" -fmerge-all-constants -Wall -Wextra -Wconversion -Wsign-conversion -Wvla -Wnon-virtual-dtor -Wcast-align -fmath-errno -fno-exceptions -Wno-psabi -Werror -DHWY_IS_TEST=1 -DGTEST_HAS_PTHREAD=1 -MD -MT CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o -MF CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o.d -o CMakeFiles/copy_test.dir/hwy/contrib/algo/copy_test.cc.o -c '/<>/hwy/contrib/algo/copy_test.cc' In file included from /<>/hwy/aligned_allocator.h:32, from /<>/hwy/contrib/algo/copy_test.cc:18: /<>/hwy/base.h: In function ‘hwy::N_Z14::Load<hwy::N_Z14::Simd<unsigned short, 1ul, 0>, (void)0, unsigned short>(hwy::N_Z14::Simd<unsigned short, 1ul, 0>, unsigned short const)decltype (Zero((hwy::N_Z14::Simd<unsigned short, 1ul, 0>)()))’: /<>/hwy/base.h:336:14: error: inlining failed in call to ‘always_inline’ ‘hwy::CopyBytes<2ul, unsigned short, unsigned short>(unsigned short const, unsigned short)void’: target specific option mismatch 336 | HWY_API void CopyBytes(const From HWY_RESTRICT from, To HWY_RESTRICT to) { | ^~~ In file included from /<>/hwy/highway.h:586, from /<>/hwy/contrib/algo/copy_test.cc:24, from /<>/hwy/foreach_target.h:290, from /<>/hwy/contrib/algo/copy_test.cc:23: /<>/hwy/ops/ppc_vsx-inl.h:697:26: note: called from here 697 | CopyBytes<d.MaxBytes()>(p, &bits); | ~~~~~^~~~~~

ref:

The above compiler error happens due to the -march=z15 option, and the -march=z15 option assumes that you are targeting a z15 or z16 mainframe.

It is possible to work around the above compiler error by disabling the HWY_Z14 target if you only need to support z15 or later.

malaterre commented 3 weeks ago

@johnplatts I can build and run PPC9 & PPC8 hwy on ppc64el.

What compiler options should I use to build and run Z15 & Z14 hwy on s390x ?

johnplatts commented 3 weeks ago

@johnplatts I can build and run PPC9 & PPC8 hwy on ppc64el.

What compiler options should I use to build and run Z15 & Z14 hwy on s390x ?

To compile for Z14 or later, use the -march=z14 -mzvector compiler options.

malaterre commented 3 weeks ago

To compile for Z14 or later, use the -march=z14 -mzvector compiler options.

I had to read that sentence twice. Anyway that did the trick, I see the Z15 tests:

kudos for the work, all tests are passing !

malaterre commented 3 weeks ago

@johnplatts

One note though, could you confirm this:

obj-*/examples/hwy_benchmark
Measurement failed: overhead 10 < 12
MeasureClosure failed.

F(x)->2*x^2, F(3) = 18.0
------------------------ Z15
       dot:   3456:  0.383 (+/- 0.001)
     delta:   3456:  0.775 (+/- 0.000)

F(x)->2*x^2, F(3) = 18.0
------------------------ Z14
       dot:   3456:  0.088 (+/- 0.001)
jan-wassenberg commented 3 weeks ago

No worries, MeasureClosure can spuriously 'fail'. It just indicates various sources of noise were too large. For example, it could be that the thread migrated to a different core and thus the timer went off a bit. It's fine to ignore that.