intel / ARM_NEON_2_x86_SSE

The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to AVX2 intrinsic functions
Other
430 stars 149 forks source link

Duplicated declaration (static follows non-static) #43

Closed kalaluthien closed 4 years ago

kalaluthien commented 4 years ago

Last commit (803a3d3c44b0ce81a1b5a312fa9d61879563dbb4) introduced duplicated declaration error (when USE_SSE4 is defined because of __SSE4_2__ is defined) for tensorflow lite runtime compilation:

$HOME/tensorflow/tensorflow/lite/tools/make/downloads/neon_2_sse/NEON_2_SSE.h:5342:33: error: static declaration of 'vcleq_u8' follows non-static declaration
    _NEON2SSESTORAGE uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0
    ¦   ¦   ¦   ¦   ¦   ¦   ¦   ^
$HOME/tensorflow/tensorflow/lite/tools/make/downloads/neon_2_sse/NEON_2_SSE.h:769:29: note: previous declaration is here
_NEON2SSE_GLOBAL uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0
    ¦   ¦   ¦   ¦   ¦   ¦   ^
$HOME/tensorflow/tensorflow/lite/tools/make/downloads/neon_2_sse/NEON_2_SSE.h:5343:33: error: static declaration of 'vcleq_u8' follows non-static declaration
    _NEON2SSE_INLINE uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b) // VCGE.U8 q0, q0, q0
    ¦   ¦   ¦   ¦   ¦   ¦   ¦   ^
$HOME/tensorflow/tensorflow/lite/tools/make/downloads/neon_2_sse/NEON_2_SSE.h:769:29: note: previous declaration is here
_NEON2SSE_GLOBAL uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0
    ¦   ¦   ¦   ¦   ¦   ¦   ^

Because _NEON2SSESTORAGE == static, it follows previous non-static (_NEON2SSE_GLOBAL) declaration.

Can you check this? Sorry for not giving simple reproducible code...

kalaluthien commented 4 years ago

Below is an workaround for someone suffering same issues:

NEON_2_SSE.h:
769 #if !defined(USE_SSE4)
770 _NEON2SSE_GLOBAL uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0
771 _NEON2SSE_GLOBAL uint16x8_t vcleq_u16(uint16x8_t a, uint16x8_t b); // VCGE.U16 q0, q0, q0
772 _NEON2SSE_GLOBAL uint32x4_t vcleq_u32(uint32x4_t a, uint32x4_t b); // VCGE.U32 q0, q0, q0
773 #endif
Zvictoria commented 4 years ago

Mega thanks for reporting! Undone this commit. To be fixed later on.

Bizonu commented 4 years ago

Sorry for adding these errors, I forgot to test with SSE4 enabled. I've created now a new pull request (#44) that fixes this

dtsmith2001 commented 4 years ago

If anyone needs the patch to build Tensorflow, apply

https://patch-diff.githubusercontent.com/raw/intel/ARM_NEON_2_x86_SSE/pull/44.patch

Here is the patch as of about 8 PM EDT 5/18/2020:

arm_neon_2_x86_sse-use_sse4.patch.gz

  1. tensorflow/lite/tools/make/download_dependencies.sh
  2. patch -p1 < curl https://patch-diff.githubusercontent.com/raw/intel/ARM_NEON_2_x86_SSE/pull/44.patch
  3. continue as usual