ptitSeb / box64

Box64 - Linux Userspace x86_64 Emulator with a twist, targeted at ARM64 Linux devices
https://box86.org
MIT License
3.82k stars 274 forks source link

inlining failed in call to ‘always_inline’ ‘_mm_aesdeclast_si128’: target specific option mismatch #648

Open josch opened 1 year ago

josch commented 1 year ago

Hi,

I tried to build tests/test18.c with gcc on Debian unstable and got this:

/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h: In function 'main':
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm_aesdeclast_si128': target specific option mismatch
   52 | _mm_aesdeclast_si128 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~~~~~
96bf73b3.c:33:37: note: called from here
   33 |             mm128i declast = { .m = _mm_aesdeclast_si128(x.m, y.m) };
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:44:1: error: inlining failed in call to 'always_inline' '_mm_aesdec_si128': target specific option mismatch
   44 | _mm_aesdec_si128 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
96bf73b3.c:32:33: note: called from here
   32 |             mm128i dec = { .m = _mm_aesdec_si128(x.m, y.m) };
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:69:1: error: inlining failed in call to 'always_inline' '_mm_aesenclast_si128': target specific option mismatch
   69 | _mm_aesenclast_si128 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~~~~~
96bf73b3.c:31:37: note: called from here
   31 |             mm128i enclast = { .m = _mm_aesenclast_si128(x.m, y.m) };
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:61:1: error: inlining failed in call to 'always_inline' '_mm_aesenc_si128': target specific option mismatch
   61 | _mm_aesenc_si128 (__m128i __X, __m128i __Y)
      | ^~~~~~~~~~~~~~~~
96bf73b3.c:30:33: note: called from here
   30 |             mm128i enc = { .m = _mm_aesenc_si128(x.m, y.m) };
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:77:1: error: inlining failed in call to 'always_inline' '_mm_aesimc_si128': target specific option mismatch
   77 | _mm_aesimc_si128 (__m128i __X)
      | ^~~~~~~~~~~~~~~~
96bf73b3.c:57:29: note: called from here
   57 |         mm128i imc = { .m = _mm_aesimc_si128(x.m) };
      |                             ^~~~~~~~~~~~~~~~~~~~~

How can I produce the amd64 test binaries myself?

ptitSeb commented 1 year ago

use gcc -march=corei7 -O2 -g -maes -mpclmul test18.c -o test18 I guess I should put a comment line in the .c file with that.

josch commented 1 year ago

Test 17 fails with the same error unless one adds -march=corei7.

Are you compiling all tests with the same options?

Background is, that in Debian we are not allowed to ship pre-compiled binaries. Instead everything has to be compiled from source to make sure that we do not accidentally ship things that cannot be regenerated from source. So to package box64 for Debian, I delete the test binaries and then cross-compile them from arm64 to amd64 during the build process.

Would you be interested in a patch that performs this generation of the test binaries as part of CMakeLists.txt if a flag like -DCROSS_REBUILD_TESTS=1 or something similar is given? This would of course need a cross compiler to be installed which is easy in Debian by installing the gcc-x86-64-linux-gnu package on arm64.

ptitSeb commented 1 year ago

The early test were built with -march=core2 only, and then after box evolved I used corei7 and more advanced option.

I understand the need to cross-compile the tests, but they should not be distributed when installing box, or is there something I missed?

I would accept a patch to cross build the test binary, sure, that could be handy (altho tests stil need the reference output made on an actual x86_64 to work).

josch commented 1 year ago

I understand the need to cross-compile the tests, but they should not be distributed when installing box, or is there something I missed?

For the Debian package, I'm downloading the tarball that is generated by github when downloading git tags. Since you store the x86_64 binaries in git, that tarball will also include them. I'm removing those binaries from the source tarball. Does that answer your question?

I would accept a patch to cross build the test binary, sure, that could be handy (altho tests stil need the reference output made on an actual x86_64 to work).

The tests/ref*.txt files are not binary files but human readable text files containing data which is probably not even copyright-able. So distributing those files as part of the source package is okay for Debian.

ptitSeb commented 1 year ago

yeah, that answer my questions (but I still find it a bit strange). Don't forget to remove bash also from the test folder.

josch commented 1 year ago

The reasoning behind removing binary files from the source is, that this makes sure that we do not accidentally distribute binaries that cannot be built from source. The Debian Free Software Guidelines enforce that everything we shipped has to be buildable from source.

And yes, this can cause problems especially with test cases where it's quite common to run tests on arbitrary binary artifacts. Fortunately is this case the problem can be solved easily by cross-compiling the test binaries.

josch commented 1 year ago

I think more things are amiss. I rebuilt all amd64 on amd64 (so no cross compiling) like this:

for f in tests/test*.c; do gcc -march=corei7 -O2 -g -maes -mpclmul $f -o tests/$(basename $f .c); done

After having regenerated all test binaries in that fashion, not all tests pass anymore, namely the following tests now fail:

Can you reproduce this? If not, I'll share the output and the binaries themselves for investigation.

In any case, it would be really nice to know how to regenerate these binaries. If even building natively doesn't succeed, then of course cross-compiling will not produce a working version either.

josch commented 1 year ago

Now that box64 is in Debian it would be nice to get the tests to work. To do that I first have to compile them but when I do, the tests fail:

Test project /home/josch/git/box64
    Start  1: bootSyscall
1/3 Test  #1: bootSyscall ......................***Failed    0.30 sec
CMake Error at runTest.cmake:52 (message):
  Failed: The output of /home/josch/git/box64/box64 did not match
  /home/josch/git/box64/tests/ref01.txt

    Start 15: sse_asm
2/3 Test #15: sse_asm ..........................***Failed    0.23 sec
CMake Error at runTest.cmake:52 (message):
  Failed: The output of /home/josch/git/box64/box64 did not match
  /home/josch/git/box64/tests/ref16.txt

    Start 16: sse_intrinsics
3/3 Test #16: sse_intrinsics ...................***Failed    0.42 sec
CMake Error at runTest.cmake:52 (message):
  Failed: The output of /home/josch/git/box64/box64 did not match
  /home/josch/git/box64/tests/ref17.txt

0% tests passed, 3 tests failed out of 3

Total Test time (real) =   0.96 sec

The following tests FAILED:
      1 - bootSyscall (Failed)
     15 - sse_asm (Failed)
     16 - sse_intrinsics (Failed)
Errors while running CTest

bootSyscall outputs nothing in test01.out instead of the expected "Hello x86_64 World!"

There are many differences in the output of sse_asm (i can paste them as well if you like) and a few in the output of sse_intrinsics:

--- test17.out  2023-07-21 01:43:05.906646868 +0200
+++ /home/josch/git/box64/tests/ref17.txt   2023-04-10 07:33:40.135066603 +0200
@@ -489,8 +489,8 @@
 subsd(1 2 , 1 2 ) = 0 2 
 subsd(1 2 , 0 -2 ) = 1 2 
 subsd(1 2 , inf -inf ) = -inf 2 
-subsd(1 2 , 0x7ff8000000000000 -0 ) = 0xfff8000000000000 2 
-subsd(0 -2 , 0x7ff8000000000000 -0 ) = 0xfff8000000000000 -2 
+subsd(1 2 , 0x7ff8000000000000 -0 ) = 0x7ff8000000000000 2 
+subsd(0 -2 , 0x7ff8000000000000 -0 ) = 0x7ff8000000000000 -2 
 subsd(inf -inf , 0x7ff8000000000000 -0 ) = 0x7ff8000000000000 -inf 
 subsd(1 2 , 2 1 ) = -1 2 
 subsd(1 2 , -2 0 ) = 3 2