type conversion error in waf configure's FMV test

kazuki commented 7 years ago

caused in #373

config.log

Checking for function multiversioning
==>
#include <immintrin.h>
__attribute__((target("default"))) void test() {}
__attribute__((target("sse2"))) void test() { __m128i x; _mm_xor_si128(x,x); }
__attribute__((target("avx2"))) void test() { __m256i x; _mm256_xor_si256(x,x); _mm256_srl_epi32(x,x); }
int main() { test(); }

<==
[1/3] Compiling build/.conf_check_aaa45041f7059160d1578dc02aea4d1e/test.cpp

['/usr/bin/g++', '-O2', '-Wall', '-g', '-pipe', '-fno-omit-frame-pointer', '-pthread', '-DJUBATUS_CORE_VERSION="1.0.4"', '-DJUBATUS_CORE_APPNAME="jubatus_core"', '-DJUBATUS_PLUGIN_DIR="/usr/local/lib/jubatus/plugin"', '-DNDEBUG=1', '-DJUBATUS_DISABLE_ASSERTIONS=1', '-DBUILD_DIR="/home/kazuki/projects/jubatus_core/build"', '-DJUBATUS_USE_EIGEN=1', '../test.cpp', '-c', '-o/home/kazuki/projects/jubatus_core/build/.conf_check_aaa45041f7059160d1578dc02aea4d1e/testbuild/test.cpp.1.o']
err: ../test.cpp: In function ‘void test()’:
../test.cpp:4:101: error: cannot convert ‘__m256i {aka __vector(4) long long int}’ to ‘__m128i {aka __vector(2) long long int}’ for argument ‘2’ to ‘__m256i _mm256_srl_epi32(__m256i, __m128i)’
 __attribute__((target("avx2"))) void test() { __m256i x; _mm256_xor_si256(x,x); _mm256_srl_epi32(x,x); }
                                                                                                     ^

from /home/kazuki/projects/jubatus_core: Test does not build: Traceback (most recent call last):
  File "/home/kazuki/projects/jubatus_core/.waf3-1.9.7-d27222240ebc8bcbca7fcd8f4ae914fb/waflib/Configure.py", line 324, in run_build
    bld.compile()
  File "/home/kazuki/projects/jubatus_core/.waf3-1.9.7-d27222240ebc8bcbca7fcd8f4ae914fb/waflib/Tools/errcheck.py", line 132, in check_compile
    ret=self.orig_compile()
  File "/home/kazuki/projects/jubatus_core/.waf3-1.9.7-d27222240ebc8bcbca7fcd8f4ae914fb/waflib/Build.py", line 181, in compile
    raise Errors.BuildError(self.producer.error)
waflib.Errors.BuildError: Build failed
 -> task in 'testprog' failed with exit status 1:
        {task 140104977625160: cxx test.cpp -> test.cpp.1.o}
['/usr/bin/g++', '-O2', '-Wall', '-g', '-pipe', '-fno-omit-frame-pointer', '-pthread', '-DJUBATUS_CORE_VERSION="1.0.4"', '-DJUBATUS_CORE_APPNAME="jubatus_core"', '-DJUBATUS_PLUGIN_DIR="/usr/local/lib/jubatus/plugin"', '-DNDEBUG=1', '-DJUBATUS_DISABLE_ASSERTIONS=1', '-DBUILD_DIR="/home/kazuki/projects/jubatus_core/build"', '-DJUBATUS_USE_EIGEN=1', '../test.cpp', '-c', '-o/home/kazuki/projects/jubatus_core/build/.conf_check_aaa45041f7059160d1578dc02aea4d1e/testbuild/test.cpp.1.o']

no
from /home/kazuki/projects/jubatus_core: The configuration failed

kmaehashi commented 7 years ago

Oh... thanks so much for pointing this out. I'll fix it.

kmaehashi commented 7 years ago

I tried to fix it:

$ cat test.c
#include <immintrin.h>
__attribute__((target("default"))) void test() {}
__attribute__((target("sse2"))) void test() { __m128i x; _mm_xor_si128(x,x); }
__attribute__((target("avx2"))) void test() { __m256i x; __m128i y; _mm256_xor_si256(x,x); _mm256_srl_epi32(x,y); }
int main() { test(); }

However,

$ g++ -O0 -S test.c

$ cat test.s

... snip ...

_Z4testv.avx2:
.LFB3448:
    .cfi_startproc
    leaq    8(%rsp), %r10
    .cfi_def_cfa 10, 0
    andq    $-32, %rsp
    pushq   -8(%r10)
    pushq   %rbp
    .cfi_escape 0x10,0x6,0x2,0x76,0
    movq    %rsp, %rbp
    pushq   %r10
    .cfi_escape 0xf,0x3,0x76,0x78,0x6
    subq    $80, %rsp
    vmovdqa -48(%rbp), %ymm0
    vmovdqa %ymm0, -112(%rbp)
    vmovdqa -48(%rbp), %ymm0
    vmovdqa %ymm0, -208(%rbp)
    vmovdqa -48(%rbp), %ymm0
    vmovdqa %ymm0, -144(%rbp)
    vmovdqa -64(%rbp), %xmm0
    vmovaps %xmm0, -160(%rbp)
    nop
    addq    $80, %rsp
    popq    %r10
    .cfi_def_cfa 10, 0
    popq    %rbp
    leaq    -8(%r10), %rsp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE3448:
    .size   _Z4testv.avx2, .-_Z4testv.avx2
    .globl  main
    .type   main, @function

... snip ...

Hmm, _mm256_xor_si256 and _mm256_srl_epi32 is not producing AVX2 code...

kazuki commented 7 years ago

$ cat test.c
#include <immintrin.h>
__attribute__((target("default"))) void test() {}
__attribute__((target("sse2"))) void test() { __m128i x; x = _mm_xor_si128(x,x); }
__attribute__((target("avx2"))) void test() { __m256i x; __m128i y; x = _mm256_srl_epi32(x,y); }
int main() { test(); }
$ g++ --version
g++ (Gentoo 7.1.0-r1 p1.1) 7.1.0                                           
Copyright (C) 2017 Free Software Foundation, Inc.                          
This is free software; see the source for copying conditions.  There is NO 
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ g++ -O0 -S test.c
$ cat test.s
... snip ...
_Z4testv.sse2:
.LFB3672:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movdqa  -48(%rbp), %xmm0
        movaps  %xmm0, -32(%rbp)
        movdqa  -48(%rbp), %xmm0
        movaps  %xmm0, -16(%rbp)
        movdqa  -32(%rbp), %xmm1
        movdqa  -16(%rbp), %xmm0
        pxor    %xmm1, %xmm0
        movaps  %xmm0, -48(%rbp)
        nop
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
:
:
_Z4testv.avx2:
.LFB3673:
        .cfi_startproc
        leaq    8(%rsp), %r10
        .cfi_def_cfa 10, 0
        andq    $-32, %rsp
        pushq   -8(%r10)
        pushq   %rbp
        .cfi_escape 0x10,0x6,0x2,0x76,0
        movq    %rsp, %rbp
        pushq   %r10
        .cfi_escape 0xf,0x3,0x76,0x78,0x6
        vmovdqa -80(%rbp), %ymm0
        vmovdqa %ymm0, -48(%rbp)
        vmovdqa -112(%rbp), %xmm0
        vmovaps %xmm0, -96(%rbp)
        vmovdqa -96(%rbp), %xmm1
        vmovdqa -48(%rbp), %ymm0
        vpsrld  %xmm1, %ymm0, %ymm0
        vmovdqa %ymm0, -80(%rbp)

ignore return value, gcc cannot generate sse2/avx opcode.

kmaehashi commented 7 years ago

Thanks! Not ignoring the return value generates AVX2 instruction in gcc-5.3.0 (#349) too:

vpsrld  %xmm1, %ymm0, %ymm0

However, the problem is that this test snippet builds successfully on gcc-5.3.0.

I looked into the assembly code of the original error:

[183/213] Compiling jubatus/core/nearest_neighbor/lsh_function.cpp
{standard input}: Assembler messages:
{standard input}:22469: Error: suffix or operands invalid for `vpsrld'

Line 22469 is:

vpsrld  $8, %ymm0, %ymm0

so the assembler fails only when immediates are given...?

kmaehashi commented 7 years ago

We decided to fix this in next release.

kmaehashi commented 7 years ago

Will be fixed in https://github.com/jubatus/jubatus_core/pull/379

kmaehashi commented 7 years ago

Fixed via #379

jubatus / jubatus_core

type conversion error in waf configure's FMV test #378